You are on page 1of 138

AN INTRODUCTORY GUIDE TO SHAZAM

PART I - Learning the Basics


Search the SHAZAM Guide A Brief History of SHAZAM Examples to accompany the SHAZAM User's Reference Manual - Version 10. Examples to accompany: Principles of Econometrics (Third Edition) by Hill, Griffiths and Lim The Practice of Econometrics by Berndt Econometric Analysis by Greene Undergraduate Econometrics by Hill, Griffiths and Judge Introductory Econometrics by Wooldridge SHAZAM Terms Variables File Data file Commands SAMPLE, READ, GENR and more Comment statement Program (command file) Output file Memory PAR Error messages Rounding errors Getting Started How to Run SHAZAM Using SHAZAM as a Calculator Statistics Mean, Variance, Median, Quartiles Covariance and Correlation Computing Probabilities for Normal Random Variables Simulation Experiments More Data Analysis

PART II - Practicing Econometrics


Ordinary Least Squares Ordinary Least Squares Regression Comparing linear vs. log-linear models Confidence Intervals Hypothesis Testing Estimation with restrictions Prediction Special Topics Working with lagged variables Using trend variables Dummy Variables - Modelling structural change, seasonality and more Diagnostic Testing Testing for Heteroskedasticity Testing for Autocorrelation

Testing for Structural Stability - the Chow Test Generalized Least Squares Estimation of models with Heteroskedastic errors Estimation of models with Autoregressive errors Pooling Time-Series Cross-Section Data Sets of Linear Equations - SUR Estimation Maximum Likelihood Estimation Logit and Probit Analysis ARCH and GARCH Models Time Series Analysis Index Numbers Moving Averages and Exponential Smoothing Financial Time Series

PART I - Learning the Basics Variables


A variable contains a set of observations (numeric data). A common type of variable is a macroeconomic time series observed at equally spaced time intervals such as annual, quarterly or monthly. A variable could also contain cross-sectional data observed on individuals, households, industries or regions collected at one point in time. Variable names Each variable has a name assigned by the user. SHAZAM variable names may consist of letters and numbers and start with a letter. Variable names may be up to 8 characters long. What is a good variable name ? Textbooks in statistics and econometrics often talk about variables like Y and X1, X2, X3. These are general names that give no descriptive information. A sensible approach is to assign a variable name that is descriptive of the type of data. Example For the Theil textile data set SHAZAM variable names that may be appropriate to assign are: 1. 2. 3. 4.
YEAR CONSUME INCOME PRICE

Alternatively, the following variable names could be used for this data set: 1. T
2

2. V1 3. V2 4. V3 Notes

Once a variable name has been set it must always be referred to by this name. For example, if a variable is assigned the name CONSUME it cannot be referred to as CONS or CNSM. If spelling errors are made then SHAZAM error messages will result. In general, upper case and lower case names are interchangeable. However, if you wish to distinguish upper and lower case variables and file names in SHAZAM then use the command: SET NOLCUC.

File

A file is a collection of information. This is the basic unit of storage on the computer disk. SHAZAM makes use of files for input of information and output of results. Input files (such as data files and SHAZAM command files) must be created and/or managed by the user. Input files can be created and modified with a text editor .

Text Editor

An editor is a program that is like a word processor but much simpler and with fewer features. Windows SHAZAM provides an editor window for preparing SHAZAM command files. Alternatively, examples of editors that can be used are:
Windows Macintosh Notepad SimpleText

Note: As a general rule do not use the TAB key when preparing data files or SHAZAM command files. Use the space bar to line things up. If a word processor is used to create data files and SHAZAM command files, the file must be saved in ASCII (.txt extension) format. In the word processor, from the File menu select Save as ... and use the Text only option to save the file.

Data file

A SHAZAM data file contains a set of numeric observations on a group of variables.


3

Sources of data files

Data files may be obtained by on-line retrieval from the internet. Attention must be given to respecting licence agreements and acknowledging data sources. Data may be collected from a survey. In this case, it may be necessary to type the data into a data file. A text editor can be used to prepare the data file. Data files may be provided by other researchers. Some academic journals maintain archives of data sets that have been used in publications. Data files may be created by SHAZAM. Variables can be constructed with SHAZAM commands and the WRITE command can be used to write the new data set to a data file.

Examples
1. The Theil textile data set 2. A household food expenditure data set Rules for preparing SHAZAM data files

The standard format for a SHAZAM data file requires that the file be prepared as a plain text file with numbers separated by spaces or commas. Free format is allowed. That is, there are no constraints on column position. Note that a comma is treated as a separator. Therefore, the number 12,560 will be interpreted as two numbers: 12 and 560. For correct interpretation by SHAZAM, commas in numeric data should be removed. This can be done in an editor with a global edit change. In general, there must be no descriptive information and no special characters of any kind embedded in the data file (an exception to this is when the FORMAT command is used - see below). Data documentation can be placed as a header to the file or at the very end of the file (this is discussed in further detail in the section on comment statements). Spread-sheet data files can used with one of the following methods:

Convert the spreadsheet to a plain text file (an ASCII file) by using the Save As ... option from the File menu. Save the spreadsheet in DIF format. DIF files can be loaded with the SHAZAM READ command. Instructions are in the SHAZAM User's Reference Manual. Microsoft Excel XLS files can be read by SHAZAM. Instructions are available.

Two data preparation styles are permitted:


1. NOBYVAR - Observation by observation. Observations for all variables begin on a new line. The Theil textile data set is prepared in this way. 2. BYVAR - Variable by variable. All observations for each new variable begin on a new line.

Example: Consider a data set with observations on height (inches) and weight (pounds) for 6 individuals (this example is from Chotikapanich and Griffiths, 1993). The data set prepared with the observation by observation method is:
4

69 56 62 67 70 63

112 84 102 135 165 120

The data set prepared with the BYVAR (variable by variable) method is:
69 112 56 84 62 67 70 63 102 135 165 120

Data can be prepared in more than one data file. Then, multiple READ commands can be used to load the complete data set into SHAZAM memory. Special formats Character data in data files can be read using the FORMAT command. When this command is used the data cannot be in free format. More details are in the SHAZAM User's Reference Manual. Missing values Missing values should be assigned a numeric missing value code. References D. Chotikapanich and W.E. Griffiths, Learning SHAZAM: A Computer Handbook for Econometrics, Wiley, 1993.

SHAZAM Commands

Commands give instructions to SHAZAM. Commands tell SHAZAM to perform a certain task. Many SHAZAM commands also have options that instruct SHAZAM to perform additional tasks. Options are listed after a / character and can be specified in any order. A complete description of SHAZAM commands is given in the SHAZAM User's Reference Manual. Some important SHAZAM commands that are introduced here are:

The SAMPLE command The READ command The GENR command The GEN1 command The PRINT command The WRITE command The SET command The SKIPIF command The STAT command The STOP command 5

An online reference guide to all SHAZAM commands is available. Notes


1. Commands and command options have names. Shortened names are allowed. For example, the SAMPLE command can be referred to as SAMP. On the READ command the SKIPLINES= option can be referred to as SKIP=. 2. Commands can have continuation lines. If a command is terminated with the & character this indicates that the command is continued on the next line. An example of the use of continuation lines follows:
READ (KLEIN.txt) WG T G TIME PLAG KLAG XLAG & C I W1 P WGWP X

3. 4. Variable name lists can be abbreviated by the use of a numbered range list. The variables in a numbered range list have the identical name, except for the last character or characters, which are consecutive numbers. For example, the following two commands are equivalent.
STAT A1 A2 A3 A4 A5 A6 STAT A1-A6

6.
7. In general, SHAZAM commands are not case sensitive. That is, upper case and lower case is interchangeable. However, if you wish to distinguish upper and lower case variables and file names in SHAZAM then use the command: SET NOLCUC.

[SHAZAM Guide home]

The SAMPLE command

The SAMPLE command sets the sample range of the data. The general command format is:
SAMPLE beg end

where beg and end are numbers specifying the beginning and ending observations to use for subsequent calculations. Example The Theil textile data set has 17 observations on each variable and so the SAMPLE command should be:
SAMPLE 1 17

To restrict the analysis to start in observation 3 and end in observation 14 the appropriate SAMPLE command is:
SAMPLE 3 14

[Back to Top]

[SHAZAM Guide home]

The READ command

The READ command loads the data set into SHAZAM memory and assigns variable names. The general command format is:
READ (filename) vars / options

where filename is the data filename, vars is a list of user assigned variable names, and options is a list of desired options. All the variables listed must have the same number of observations. Some useful options are:
BYVAR

Read the data by variable. The data file must be prepared so that all observations for each new variable begin on a new line (more than 1 line can be used). Read the data by observation. The data file must be prepared so that observations for all variables begin on a new line (more than 1 line can be used). This is the default option. List all the data on the output file.

NOBYVAR

LIST

SKIPLINES=n Skips the number of lines specified before reading data from the data file. This option is

required when the data file contains a header of descriptive information.

Rules for preparing data files for SHAZAM can be viewed. Instructions are also available for working with Microsoft Excel XLS files. Example 1 - Loading data from a data file For the Theil textile data set each line of the data file contains one observation on the 4 variables. The following command assigns variable names and loads the data from the file THEIL.txt into SHAZAM memory.
SAMPLE 1 17 READ (THEIL.txt) YEAR CONSUME INCOME PRICE

The next command shows the use of the LIST option. This option instructs SHAZAM to list the data set on the output file. This can be very useful for checking that the data set has been read in correctly.
7

SAMPLE 1 17 READ (THEIL.txt) YEAR CONSUME INCOME PRICE / LIST

Example 2 - Small data sets - working without data files Consider a data set with observations on height (inches) and weight (pounds) for 6 individuals (this example is from Chotikapanich and Griffiths, 1993). The data can be entered directly in the SHAZAM command file immediately following the READ command as follows.
SAMPLE 1 6 READ HEIGHT WEIGHT 69 112 56 84 62 102 67 135 70 165 63 120

An alternative way of setting up the command file is to use the BYVAR option on the READ command as follows.
SAMPLE 1 6 READ HEIGHT WEIGHT / BYVAR 69 56 62 67 70 63 112 84 102 135 165 120

Another way of reading this simple data set is to use two READ commands (one for each variable) as follows.
SAMPLE 1 6 READ HEIGHT / BYVAR 69 56 62 67 70 63 READ WEIGHT / BYVAR 112 84 102 135 165 120

[Back to Top]

[SHAZAM Guide home]

The GENR command

The GENR command will create new variables from old ones and do a variety of data transformations. The general command format is:
GENR var=equation

where var is a user-assigned name of the variable to be generated and equation is an arithmetic expression which involves variables, scalars and mathematical functions. The following mathematical operators may be used:
8

1. 2. 3. 4. 5. 6. 7. 8.

unary functions (see list below) ** (exponentiation) * , / (multiplication, division) + , - (addition, subtraction)
.EQ., .NE. , .GE. , .GT. , .LE. , .LT. (relational operators) .NOT. (logical operator) .AND. (logical operator) .OR. (logical operator)

The arithmetic expression that is given after the equal sign (=) is evaluated (any variable that is given in this expression must already exist). The result is saved in the variable that appears before the equal sign. If the result variable does not exist then SHAZAM will create it. If the result variable is already created then SHAZAM will replace the old variable with the new results. On the GENR command arithmetic expressions are evaluated from left to right. However, some operators take priority. The order of operations conforms to the priority levels given above. Expressions in parentheses are always evaluated first. Therefore, to avoid confusion use as many levels of parentheses as desired. The available unary functions are:
ABS(x) DUM(x) EXP(x) INT(x) LAG(x)

absolute value dummy variable generator exp(x) integer truncation lag a variable one time period

LAG(x,n) lag a variable n time periods LGAM(x) LOG(x)

log gamma function natural logs

MAX(x,y) maximum of two variables MIN(x,y) minimum of two variables MOD(x,y) modulo arithmetic, remainder of x/y NCDF(x) NOR(x) SAMP(x) SEAS(x)

standard normal cumulative distribution function normal random number with standard deviation x draw a sample with replacement from the variable x seasonal dummy variable with periodicity x

SIN(x)

sine (x measured in radians)

SIN(x,-1) arcsine (defined for x in the interval [-1,1]) SQRT(x) SUM(x)

square roots cumulative sum of variable x

SUM(x,n) sum of n observations on variable x starting at observation 1 TIME(x) UNI(x)

time index plus x uniform random number with range (0,x)

Example 1 - Computing a reciprocal To compute the reciprocal of each observation in the variable A and save the result in the variable B the command to use is:
GENR B=1/A

Example 2 - Taking logarithms When working with macroeconomic time series it is very common to use log-transformed variables in the analysis. To get the logarithm of each observation in the variable CONSUME and save the result in the new variable LCONS the command to use is:
GENR LCONS=LOG(CONSUME)

Example 3 - Computing a percentage change To compute the percentage rate of change in the variable WAGE and save the results in the variable WR the command to use is:
GENR WR=100*(WAGE-LAG(WAGE))/LAG(WAGE)

Note that the LAG function is used to take a 1-period lag. This function must be used with some care. The first observation will be assigned to 0. Therefore, for subsequent analysis with the WR variable, the SAMPLE command must be set to start at observation number 2. The same calculation can be done with two GENR commands as follows:
GENR LWAGE=LAG(WAGE) GENR WR=100*(WAGE-LWAGE)/LWAGE

Example 4 - The exponentiation operator **


The command GENR E2=E**2 is the same as GENR E2=E*E 10

The command GENR SD=SQRT(VAR) is the same as GENR SD=VAR**(0.5)

More examples are in the SHAZAM User's Reference Manual.

[Back to Top]

[SHAZAM Guide home]

The GEN1 command

The GEN1 command is used for scalar arithmetic. The general command format is:
GEN1 var=equation

where var is a user-assigned name of the scalar variable to be generated and equation is an arithmetic expression. The GEN1 command is equivalent to using a SAMPLE 1 1 command with a GENR command to generate a variable with only 1 observation. Example To compute the logarithm of 2 and save the result in the variable L2PI the command to use is:
GEN1 L2PI=LOG(2*$PI)

Note the use of the variable $PI. In general, variables that start with a $ are special SHAZAM temporary variables. The SHAZAM variable $PI contains the value of set at 3.1415926535898. (On the SHAZAM output not all digits are printed).

[Back to Top]

[SHAZAM Guide home]

The PRINT command

The PRINT command is used to list variables on the SHAZAM output file. Note that this command does not direct output to the printer. The general command format is:
PRINT vars / options

where vars is a list of variable names and options is a list of desired options. Some useful options are:

11

BEG= Specifies the first observation to be printed. END= Specifies the last observation to be printed.

If the above options are not used then the range set by the SAMPLE command is used. Example To list the variables in the Theil textile data set the command to use is:
PRINT YEAR CONSUME INCOME PRICE

To list the data on CONSUME and PRICE for the final 6 observations the command to use is:
PRINT CONSUME PRICE / BEG=12 END=17

[Back to Top]

[SHAZAM Guide home]

The WRITE command

The WRITE command is used to write variables to a file. The general command format is:
WRITE (filename) vars / options

where filename is the data filename, vars is a list of variable names, and options is a list of desired options. The available options on the WRITE command are similar to those available for the READ and PRINT commands.

[Back to Top]

[SHAZAM Guide home]

The SET command

The SET command is used to set options that affect the way SHAZAM does its work. The SET command has some useful options for controlling the amount of detail that appears on the output file. Some other uses of the SET command are contained in this guide. In general, the format to turn on features is:

12

SET option

and the format to turn off features is:


SET NOoption

where option is the desired option. A complete description of available options is in the SHAZAM User's Reference Manual.

[Back to Top]

[SHAZAM Guide home]

The SKIPIF command

The SKIPIF command is used to specify conditions under which observations are to be skipped in subsequent SHAZAM commands. The general command format is:
SKIPIF (expression)

where the expression is an arithmetic expression as described for the GENR command. For each observation, the expression is evaluated and if the result is positive (or true) the observation is excluded from the sample. The expression may use the following relational operators.
Operator .EQ. .NE. .GE. .GT. .LE. .LT. Meaning Equal (=) Not Equal Greater than or equal Greater than (>) Less than or equal ( ) Less than (<)

Note that the operators must start and terminate with a dot (.). Upper case and lower case are interchangeable. An example of the SKIPIF command is:
13

SKIPIF (XVAL .LE. 5.1)

The above command will skip observations when the value of the variable XVAL is less than or equal to 5.1. Eliminating SKIPIF After working with a sub-set of observations it may be useful to return to data analysis with the full sample. This can be done by eliminating all SKIPIF commands with the command:
DELETE SKIP$

Note that the $ character in SKIP$ is a special character for SHAZAM variable names. SKIPIF Messages The SKIPIF command lists the skipped observations on the SHAZAM output file. If a large number of observations have been skipped then the output listing can be lengthy. These messages can be suppressed by using the command:
SET NOWARNSKIP

More Documentation More discussion on the use of the SKIPIF command is given in the SHAZAM User's Reference Manual.

[Back to Top]

[SHAZAM Guide home]

The STAT command

The STAT command is used to compute descriptive statistics (including mean, standard deviation, variance, minimum and maximum). The general command format is:
STAT vars / options

where vars is a list of variable names and options is a list of desired options. Some useful options are:
PCOR PCOV

Prints a correlation matrix of the variables. Prints a covariance matrix of the variables.

14

PMEDIAN Prints the median, mode, quartiles and interquartile range for each variable. MEAN=

Stores the means as a vector in the variable specified.

STDEV= Stores the standard deviations as a vector in the variable specified. SUMS=

Stores the sum of each variable as a vector in the variable specified.

Note that, in general, options that begin with the letter P are used to print results on the SHAZAM output file. Options ending in an = sign are used to save results in the variable specified. Example Consider the problem of finding the correlation between height and weight for a sample of 6 individuals. The SHAZAM commands below can be used to get the answer.
SAMPLE 1 6 READ HEIGHT WEIGHT / BYVAR 69 56 62 67 70 63 112 84 102 135 165 120 STAT HEIGHT WEIGHT / PCOR STOP

The SHAZAM output follows:


|_SAMPLE 1 6 |_READ HEIGHT WEIGHT / BYVAR 2 VARIABLES AND 6 OBSERVATIONS STARTING AT OBS |_STAT HEIGHT WEIGHT / PCOR NAME N MEAN HEIGHT 6 64.500 WEIGHT 6 119.67 ST. DEV 5.2440 28.048 VARIANCE 27.500 786.67 MINIM UM 56.000 84.000

1 MAXIMUM 70.000 165.00

CORRELATION MATRIX OF VARIABLES HEIGHT WEIGHT |_STOP 1.0000 0.81587 HEIGHT 1.0000 WEIGHT

6 OBSERVATIONS

The output shows that the correlation between height and weight is: 0.81587.

[Back to Top]

[SHAZAM Guide home]

The STOP command

The STOP command is used to terminate the SHAZAM run. The format of the command is:

15

STOP

[Back to Top]

[SHAZAM Guide home]

Comment Statements
Any line in the SHAZAM command file that begins with * is a comment statement. These lines are ignored by SHAZAM but they will appear on the output file of results. An example is:
* Summary statistics for the textile data

Comment statements are very useful for documenting the work being done by the SHAZAM commands. It is recommended that comment statements be placed after every few lines of SHAZAM commands. This clarifies the type of work that is being performed and is helpful if the commands are needed in the future either by you or other users. Comment lines cannot be placed in data files. However, data files can be annotated in 2 ways. The first way is to place the documentation at the very top of the file. Then the SKIPLINES= option on the READ command must be used to skip over the text. The second way is to place the documentation at the end of the file. Before the READ command a SAMPLE command must be set to the number of observations. Then the documentation at the end of the file will be ignored by SHAZAM when the data set is loaded into SHAZAM memory.

SHAZAM Program
A SHAZAM program contains a list of commands that give instructions for processing by the SHAZAM system. A SHAZAM program is prepared in a command file. A text editor can be used to create and modify the command file. The SHAZAM command file should be assigned some appropriate filename. It may be useful to use the extension .SHA for SHAZAM command files. Then a file with a name like WORK.SHA is immediately recognizable as a SHAZAM command file. An example of a SHAZAM command file for obtaining summary statistics for the Theil textile data set is below.
SAMPLE 1 17 READ (THEIL.txt) YEAR CONSUME INCOME PRICE * Summary statistics for the textile data STAT CONSUME INCOME PRICE STOP

16

The first command is a SAMPLE command, and the second command is a READ command to load the data and assign variable names. The final command is the STOP command. This is typical of most SHAZAM programs. The STOP command tells SHAZAM that the work is finished. Any lines that are typed after the STOP command are ignored by SHAZAM. The next SHAZAM program extends the analysis to consider working with log-transformed data. Note the use of comment statements. Also, near the end of the program, note the use of the SAMPLE command to specify a different sample period for the commands that follow.
SAMPLE 1 17 READ (THEIL.txt) YEAR CONSUME INCOME PRICE * Summary statistics for the textile data STAT CONSUME INCOME PRICE * Now get logarithms of the data GENR LCONS=LOG(CONSUME) GENR LINC=LOG(INCOME) GENR LPRICE=LOG(PRICE) * List the data PRINT LCONS LINC LPRICE * Summary statistics for the log-transformed data set STAT LCONS LINC LPRICE / PCOV PCOR * Write the new data to the file TLOG.txt WRITE (TLOG.txt) YEAR LCONS LINC LPRICE * * Repeat the analysis on a sub-set of the data SAMPLE 3 17 STAT LCONS LINC LPRICE / PCOV PCOR STOP

SHAZAM Output File

As SHAZAM executes the commands listed in the SHAZAM command file it writes results to an output file. The output file may also contain error messages. With Windows SHAZAM the output is displayed in the SHAZAM window. The SHAZAM output filename is specified by the user. It may be useful to use the extension .OUT for SHAZAM output files. Then a file with a name like WORK.OUT is immediately recognizable as a SHAZAM output file. As an example, consider the SHAZAM program that analyzes the Theil textile data. The output that is produced from a STAT command is below.
|_* Summary statistics for the log-transformed data set |_STAT LCONS LINC LPRICE / PCOV PCOR NAME N MEAN ST. DEV VARIANCE MINIMUM LCONS 17 4.8864 .18216 .33184E-01 4.5951 LINC 17 4.6333 .51253E-01 .26268E-02 4.5581 LPRICE 17 4.3118 .22141 .49022E-01 3.9627 CORRELATION MATRIX OF VARIABLES LCONS 1.0000 LINC .97862E-01 1.0000 LPRICE -.93596 .22212 LCONS LINC 17 OBSERVATIONS 1.0000 LPRICE

MAXIMUM 5.1240 4.7212 4.6151

17

COVARIANCE MATRIX OF VARIABLES LCONS .33184E-01 LINC .91368E-03 .26268E-02 LPRICE -.37750E-01 .25206E-02 LCONS LINC

17 OBSERVATIONS .49022E-01 LPRICE

How to interpret the output file By inspecting the above output it is clear that some care is required to interpret the results that appear in a SHAZAM output file. A single SHAZAM command can result in lengthy output that involves the computation of a variety of statistics. The following should be noted. Note 1: Lines that have the |_ prefix are the commands and comment statments that were entered by the user. Note 2: Some statistics may be familiar to the user and other statistics may not be familiar. Therefore the user should focus on the interpretation of the statistics that they have studied from course textbooks and/or journal articles. In preparing a research report users should never report output results that they do not understand. This could lead to erroneous discussion and conclusions. Note 3: Numeric results are reported in a form that is not suitable for pasting into a word processing document. The results give more significant digits than are needed for reporting research results in a report. In the above output, note that the variance of the variable LCONS is reported as .33184E-01 and the variance of the variable LINC is reported as 26268E-02. E-01 means to multiply by 10 to the power of -1 and E-02 means to multiply by 10 to the power of -2. Therefore the number .33184E-01 is 0.033184 and the number 26268E-02 is 0.0026268. This type of reporting notation is a convenience for SHAZAM. Never use the E notation when presenting results in a report. Always consider rounding the numerical results to a form that is appropriate for the purposes of report presentation. For example, SHAZAM reports the number 0.07749 in the form .7749E-01. This means:
.7749E-01 = (.7749)(10-1) = 0.07749

This result can be reported as: 0.077. Similarily, SHAZAM may report the number 8,347,562 in the form 0.83476E+07. This means (0.83476)(107). In this case, the number can be reported as 8,347 thousand or 8.3 million. A Note on Printing the SHAZAM output file To print the SHAZAM output file from a word processor set the font to Courier 10pt. That is, use a fixed-width font. This will ensure that the output is displayed in a readable style. Also set the margins so that lines do not wrap around to the next line.

Memory

18

Memory is the working space used for storing the data and performing calculations while SHAZAM is executing. The size of the data set and complexity of the calculations is limited by the amount of memory that SHAZAM has available. The memory allocation can be found on the SHAZAM output file. A listing similar to the following will typically appear near the top of the SHAZAM output file.
Hello/Bonjour/Aloha/Howdy/G Day/Kia Ora/Konnichiwa/Buenos Dias/Nee Hau/Ciao Welcome to SHAZAM - Version 10.0 - JUL 2004 SYSTEM=LINUX PAR= 781

The above message contains some important information. It displays the version of SHAZAM and the release date. The SYSTEM= parameter gives the type of operating system that is being used. Finally, the parameter PAR= gives the amount of memory in batches of 1024 bytes. (PAR is an abbreviation for paragraph). If the memory allocation is exhausted then calculations cannot continue and an error message will be printed on the SHAZAM output file. The PAR command can be used to increase the memory allocation. The format of the PAR command is:
PAR number

where number specifies the amount of memory that is needed.

Error Messages

Errors can arise in a variety of ways. The SHAZAM output file must be carefully checked for error messages. To begin, the output messages that follow READ commands must be checked to verify that the data set was loaded successfully. SHAZAM will continue processing commands after errors are encountered and all subsequent results will be unreliable. The SHAZAM command file must be appropriately modified so that the resulting output file is completely free of error messages. General types of errors are:
1. 2. 3. 4. Incorrect Command Format Invalid Operations Insolvable Problems Nonsense Results

Each of these is explained in more detail below. Some errors may be relatively easy to correct while others can be difficult. The help of a SHAZAM advisor may be needed. SHAZAM work should be started well in advance of any deadline so that unanticipated problems, that are the reality of any computing endeavour, can be dealt with in a satisfactory manner.

19

Incorrect Command Format

SHAZAM commands have a required format. If the rules are violated then SHAZAM cannot interpret the command and error messages will be printed on the SHAZAM output file. Examples of some common error messages follow. The STAT command name is spelled incorrectly . . .
|_STOT CONSUME INCOME PRICE UNKNOWN COMMAND...STOT ...CHECK OUTPUT CAREFULLY

The variable name CONSUME is spelled incorrectly . . .


|_GENR LCONS=LOG(CONS) $ ...SYNTAX ERROR IN LINE ABOVE

Note that the $ character points to the location of the problem. The above mistake is corrected, but now the right-hand parenthesis is omitted . . .
|_GENR LCONS=LOG(CONSUME ...WARNING..CONTINUATION EXPECTED ON NEXT LINE ...MISMATCHED PARENTHESES

Invalid Operations

Problems occur with calculations that involve divide by zero, or the logarithm of a negative number. When these are encountered, SHAZAM sets the result to a missing value code and prints a warning message on the output file. This may be a signal to a problem that needs to be corrected. An example of some SHAZAM output where an attempt was made to take the logarithm of a negative number follows.
|_GEN1 A=-5 |_GEN1 B=LOG(A) ..WARNING.ILLEGAL LOG IN OBS 1,VALUE= -5.00 USING MISSING CODE=-99999. ..OBSERVATION 1 IS ASSIGNED MISSING CODE=-99999. |_PRINT A B A -5.000000 B -99999.00

Insolvable Problems

Any statistical analysis is limited by the quality of the data set. Sometimes it may not be possible to get a satisfactory solution. For example, the following error message puzzles many users.
...MATRIX IS NOT POSITIVE DEFINITE..FAILED IN ROW 3

20

This means that a matrix inversion was required and SHAZAM found that the matrix is singular. With the OLS command this means that the explanatory variables have perfect collinearity.
Nonsense Results

Sometimes results are obtained that do not make much sense. This could occur if an error message appeared earlier on. However, it may also occur even when the output file has no error messages. Some users think that there is a bug in SHAZAM. However, SHAZAM is a well-tested system and so other possibilities need to be carefully considered. The command file should be carefully reviewed. After READ commands and GENR commands it is a good idea to put a PRINT command that lists the data. The data listing should be inspected to make sure that the data set is constructed correctly. When this is verified then the PRINT commands can be removed to make the output file easier to browse through.

Rounding Errors
Rounding means shortening the fractional part of the number. Computer programs have a limit to the amount of precision that numbers can be stored. SHAZAM is designed to retain the maximum number of digits possible in computer memory, even if they are not all printed on the output file. SHAZAM also uses highly accurate computer algorithms to minimize the impact of rounding errors. Although rounding errors may appear to be a trivial problem, the cumulative effect of rounding errors can have a significant impact on numerical calculations. To illustrate this point, the following example has been created. In each successive row, one significant digit is dropped and the numbers are rounded. By comparing the products of the four numbers in each successive row, the effect of rounding errors can be seen.
2/13 .1538462 .153846 .15385 .1538 .154 .15 .2 6/13 .4615385 .461538 .46154 .4615 .462 .46 .5 9/13 .6923077 .692308 .69231 .6923 .692 .69 .7 10/13 .7692308 .769231 .76923 .7692 .769 .77 .8 PRODUCT .037813819 .037813755 .037814962 .037797376 .037861266 .036659700 .056000000

Note the difference between the products in the first and last row. There is a 48 percentage change between the numbers. It should now be clear that rounding errors can significantly alter numerical results. The Pentium Bug Rounding error became the subject of much public discussion and controversy in 1994 when problems with the pentium chip were described in computer newsgroups. The pentium is the 64-bit microprocessor for the PC developed by Intel as the successor to the 386 and 486 microprocessors. Consider the following calculation: ( (4195835 / 3145727) * 3145727) - 4195835
21

The result should be 0. But computers with the faulty pentium chip get the result -256. This was recognized as a bug. After much embarrassing publicity Intel corrected the problem and agreed to supply users with replacement chips. SHAZAM can detect the presence of the pentium bug with the CHECKOUT command.

How to Run SHAZAM


How SHAZAM Works How to use Windows SHAZAM How to use SHAZAMD (MS-DOS command session) How to run SHAZAM over the internet

[SHAZAM Guide home]

How SHAZAM Works

SHAZAM can be used in either interactive mode or batch mode. In interactive mode SHAZAM commands are typed at a SHAZAM command prompt and SHAZAM responds by displaying results on the screen. The STOP command finishes the session. In batch mode, SHAZAM processes a command file without any user interaction. The SHAZAM command file must be prepared with a text editor. The results are written to an output file for the user to inspect. After reviewing the output file, the user can then modify the command file and re-submit it to SHAZAM. Using SHAZAM in Interactive or "Talk" mode When SHAZAM begins execution it will first display a banner that contains site license information. It then gives the SHAZAM command prompt:
TYPE COMMAND :__

Type any SHAZAM command and then press the Enter key. SHAZAM will respond after each command. Type STOP to terminate the SHAZAM session and return to the operating system. Using SHAZAM in interactive mode can be useful for short tasks or for one of the following:

SHAZAM tutorial - the DEMO command On-line HELP Using SHAZAM as a calculator

Using SHAZAM as a Calculator


22

Arithmetic Calculating a Sample Mean and Variance Calculating a Mean and Variance for 2 Different Samples Calculating the Mean Absolute Deviation Calculating Factorials and Combinations

[SHAZAM Guide home]

Arithmetic

The GEN1 command can be used as a calculator. For example, the command below computes and prints a sum of numbers:
GEN1 40+55+10

Results can be saved for use in subsequent calculations. This is illustrated in the next example.
GEN1 REV=3000+1326 GEN1 COST=48+89.5+1200 GEN1 PROFIT=REV-COST PRINT REV COST PROFIT STOP

Calculating a Sample Mean and Variance

The next commands calculate the mean, variance and standard deviation for a sample of 6 observations. Note that the calculation for the sample variance uses a divisor of 5.
SAMPLE 1 6 READ X / BYVAR 23 43 26 43 13 22 STAT X STOP

Calculating a Mean and Variance for 2 Different Samples

Consider 2 samples of observations. The first sample has 6 observations and the second sample has 9 observations. The commands below show how to calculate a mean and variance for each sample.
SAMPLE 1 6 READ X / BYVAR 23 43 26 43 13 22 STAT X * The second sample has 9 observations SAMPLE 1 9

23

* Be sure to give the second variable a different name than X READ Y / BYVAR 38 58 23 678 432 23 52 2 55 STAT Y STOP

Calculating the Mean Absolute Deviation


SAMPLE 1 6 READ X / BYVAR 23 43 26 43 13 22 STAT X / MEAN=M GENR AD=ABS(X-M) STAT AD STOP

The SHAZAM output follows.


|_SAMPLE 1 6 |_READ X / BYVAR 1 VARIABLES AND |_STAT X / MEAN=M NAME N MEAN X 6 28.333 |_GENR AD=ABS(X-M) |_STAT AD NAME N MEAN AD 6 9.7778 |_STOP

6 OBSERVATIONS STARTING AT OBS ST. DEV 12.160 VARIANCE 147.87 MINIMUM 13.000

1 MAXIMUM 43.000

ST. DEV 5.7568

VARIANCE 33.141

MINIMUM 2.3333

MAXIMUM 15.333

The mean absolute deviation is listed as the MEAN in the output from the final STAT command. The results show that the mean absolute deviation is 9.7778.

Calculating Factorials and Combinations

A result from calculus is that for an integer n: (n) = (n - 1)! where (n) is the gamma function. With the GENR or GEN1 commands the LGAM function calculates the log of the gamma function. The anti-log of this function can be used to compute factorials. For example, the number 5! is calculated with the following SHAZAM commands.
SAMPLE 1 1 GEN1 N=5 GEN1 FAC=EXP(LGAM(N+1)) PRINT FAC STOP

Now consider the problem of calculating the number of combinations of k objects chosen from n. The formula, known as the "binomial coefficient", is:
24

An example is shown with the next SHAZAM commands.


SAMPLE 1 1 GEN1 N=6 GEN1 K=3 GEN1 C=EXP(LGAM(N+1)-LGAM(K+1)-LGAM(N-K+1)) PRINT C STOP

The SHAZAM output follows.


|_SAMPLE 1 1 |_GEN1 N=6 |_GEN1 K=3 |_GEN1 C=EXP(LGAM(N+1)-LGAM(K+1)-LGAM(N-K+1)) |_PRINT C C 20.00000 |_STOP

The result shows that the total number of combinations of 3 objects chosen from 6 is 20.
Statistics

Mean, Variance, Median, Quartiles

The STAT command computes descriptive statistics on variables. Example A data set is available with inflation rates for five industrial countries. Which country's inflation rate is the most volatile ? Higher volatility is associated with higher variance. The SHAZAM commands (filename: IRATE.SHA) below can be used to answer the question.

SAMPLE 1 21 READ (IRATE.txt) YEAR USA UK JAPAN GERMANY FRANCE STAT USA UK JAPAN GERMANY FRANCE / PMEDIAN STOP

The STAT command computes the mean, variance, standard deviation, minimum and maximum for each variable in the variable list. The PMEDIAN option requests for calculation of the median, mode, quartiles and interquartile range for each variable. The SHAZAM output can be viewed. The results show that the inflation rate of the United Kingdom is the most variable with a standard deviation of 6.3. The least variable rate is Germany's inflation rate
25

with a standard deviation of 1.7. (Note that the numerical results have been rounded to 1 decimal place. The SHAZAM output typically reports more digits than are needed for report presentation.) An equivalent way of preparing the above command file is to assign variable names C1, C2, C3, C4 and C5 for the inflation rates in the 5 countries. This notation saves typing effort. SHAZAM commands that do the same work as above are:

SAMPLE 1 21 READ (IRATE.txt) YEAR C1-C5 STAT C1-C5 / PMEDIAN STOP

Changing the Sample Period

The world economy was affected by the Middle East War of October 1973 that was followed by an Arab oil embargo with a subsequent quadrupling in international oil prices. The SHAZAM commands that follow investigate differences in inflation rates for the pre-embargo and post-embargo period. Descriptive statistics are obtained for the two sample periods 1960 to 1973 and 1974 to 1980.

SAMPLE 1 21 READ (IRATE.txt) YEAR USA UK JAPAN GERMANY FRANCE * Sample period 1960 to 1973 SAMPLE 1 14 STAT USA UK JAPAN GERMANY FRANCE * Sample period 1974 to 1980 SAMPLE 15 21 STAT USA UK JAPAN GERMANY FRANCE STOP

The above commands demonstrate that the SAMPLE command is an important and useful command. The SAMPLE command is typically the first command in a SHAZAM program. The sample period can be changed by specifying a new SAMPLE command. The SHAZAM output can be viewed.

[SHAZAM Guide home]

SHAZAM output

|_SAMPLE 1 21 |_READ (IRATE.txt) YEAR USA UK JAPAN GERMANY FRANCE UNIT 88 IS NOW ASSIGNED TO: IRATE.txt

26

6 VARIABLES AND

21 OBSERVATIONS STARTING AT OBS MINIMUM 1.1000 1.0000 3.6000 1.5000 2.6000

1 MAXIMUM 13.600 24.200 24.600 7.0000 14.000

|_STAT USA UK JAPAN GERMANY FRANCE / PMEDIAN NAME N MEAN ST. DEV VARIANCE USA 21 5.1238 3.6950 13.653 UK 21 8.5476 6.3210 39.956 JAPAN 21 7.3476 4.6330 21.465 GERMANY 21 3.8667 1.6764 2.8103 FRANCE 21 6.7143 3.5791 12.810

VARIABLE = USA MEDIAN = 4.3000 LOWER 25%= 1.5500 UPPER 25%= 7.0000 INTERQUARTILE RANGE= MODE = 2.8000 WITH 2 OBSERVATIONS WARNING: MULTIPLE MODES - THE MAXIMUM IS REPORTED VARIABLE = UK MEDIAN = 6.5000 LOWER 25%= 3.8000 MODE NOT APPLICABLE VARIABLE = JAPAN MEDIAN = 6.3000 LOWER 25%= 4.4500 MODE = 3.6000

5.450

UPPER 25%=

14.650

INTERQUARTILE RANGE=

10.85

UPPER 25%= 8.0500 INTERQUARTILE RANGE= WITH 2 OBSERVATIONS

3.600

VARIABLE = GERMANY MEDIAN = 3.7000 LOWER 25%= 2.3000 UPPER 25%= 5.3500 INTERQUARTILE RANGE= MODE = 7.0000 WITH 2 OBSERVATIONS WARNING: MULTIPLE MODES - THE MAXIMUM IS REPORTED VARIABLE = FRANCE MEDIAN = 5.5000 LOWER 25%= 3.4000 UPPER 25%= 9.5000 INTERQUARTILE RANGE= MODE = 5.5000 WITH 2 OBSERVATIONS WARNING: MULTIPLE MODES - THE MAXIMUM IS REPORTED |_STOP

3.050

6.100

[Back to Top]

[SHAZAM Guide home]

SHAZAM output

|_SAMPLE 1 21 |_READ (IRATE.txt) YEAR USA UK JAPAN GERMANY FRANCE UNIT 88 IS NOW ASSIGNED TO: IRATE.txt 6 VARIABLES AND 21 OBSERVATIONS STARTING AT OBS |_* Sample period 1960 to 1973 |_SAMPLE 1 14 |_STAT USA UK JAPAN GERMANY FRANCE NAME N MEAN ST. DEV USA 14 3.0500 1.8275 UK 14 4.8000 2.3485 1

VARIANCE 3.3396 5.5154

MINIMUM 1.1000 1.0000

MAXIMUM 6.2000 9.5000

27

JAPAN GERMANY FRANCE

14 14 14

6.0857 3.4143 4.5143

2.1300 1.6379 1.5195

4.5367 2.6829 2.3090

3.6000 1.5000 2.6000

12.000 7.0000 7.5000

|_* Sample period 1974 to 1980 |_SAMPLE 15 21 |_STAT USA UK JAPAN GERMANY FRANCE NAME N MEAN ST. DEV USA 7 9.2714 2.8582 UK 7 16.043 4.7878 JAPAN 7 9.8714 7.1114 GERMANY 7 4.7714 1.4568 FRANCE 7 11.114 1.9540 |_STOP

VARIANCE 8.1690 22.923 50.572 2.1224 3.8181

MINIMUM 5.8000 8.3000 3.6000 2.7000 9.1000

MAXIMUM 13.600 24.200 24.600 7.0000 14.000

Covariance and Correlation

The STAT command has options to compute the covariances and correlations between variables. The PCOV option will list the sample covariances and the PCOR option will list the sample correlations. Example The business section of newspapers give regular reports on movements in interest rates and foreign exchange rates. For example, the following is from the Vancouver Sun (byline: Norma Greenaway, November 26, 1997).
The Bank of Canada nudged its benchmark rate a quarter point to four per cent on Monday. ... The central bank's modest hike ... set off a round of rate increases by the major commercial banks. Beginning today, the prime lending rate -- available only to their most credit-worthy customers -- will climb to 5.5 per cent from 5.25 per cent. Mortgage rates were not immediately affected. ... The dollar strengthened slightly against its U.S. counterpart in the wake of the announcement, closing at 70.57 cents -- up about one-fifth of a cent from its opening.

The above recognizes the important associations between various economic variables. The strength of these associations can be considered by analyzing a data set. A monthly data set with the Canadian chartered bank prime business loan rate, the 5 year mortgage rate and the Canadian/U.S. $ exchange rate was collected from the Statistics Canada CANSIM data base. The SHAZAM commands that follow first read the data and assign variable names. The STAT command is then used to obtain the correlations between the variables of interest.
SAMPLE 1 204 READ (MOBS.txt) DATE PRIME MRATE5 USCAN STAT PRIME MRATE5 USCAN / PCOV PCOR STOP

The SHAZAM output can be viewed. The results show that the estimated correlation between the prime lending rate and the 5-year mortgage rate is 0.93 (for presentation purposes, the numerical results have been rounded to 2 decimal places). The estimated correlation between the prime lending rate and the Canadian/U.S. $ exchange rate is -0.49.

28

[SHAZAM Guide home]

SHAZAM output

|_SAMPLE 1 204 |_READ (MOBS.txt) DATE PRIME MRATE5 USCAN UNIT 88 IS NOW ASSIGNED TO: MOBS.txt 4 VARIABLES AND 204 OBSERVATIONS STARTING AT OBS |_STAT PRIME MRATE5 USCAN / PCOV PCOR NAME N MEAN ST. DEV VARIANCE MINIMUM PRIME 204 10.963 3.6408 13.255 4.7500 MRATE5 204 12.034 2.9599 8.7610 6.9400 USCAN 204 1.2669 0.83719E-01 0.70089E-02 1.1279 CORRELATION MATRIX OF VARIABLES PRIME 1.0000 MRATE5 0.92975 1.0000 USCAN -0.49047 -0.42991 PRIME MRATE5 COVARIANCE MATRIX OF VARIABLES PRIME 13.255 MRATE5 10.019 8.7610 USCAN -0.14950 -0.10653 PRIME MRATE5 |_STOP 204 OBSERVATIONS 1.0000 USCAN 204 OBSERVATIONS 0.70089E-02 USCAN

1 MAXIMUM 22.750 21.460 1.4129

Computing Probabilities for Normal Random Variables

The cumulative distribution function (CDF) for the standard normal random variable can be computed with the NCDF function on the GENR command. The command format is:
GENR prob=NCDF(var)

where var is a variable that contains numbers and prob is a variable that will contain the probabilities. If the probability for only one number is required then the GEN1 command should be used. The command format is:
GEN1 prob=NCDF(z)

where z is a number or scalar variable.

29

The DISTRIB command can be used for the calculation of the properties of a variety of probability distributions. For the normal distribution, the general format of the DISTRIB command is:
DISTRIB var / TYPE=NORMAL options

where var is a variable with values and options is a list of desired options. The option TYPE=NORMAL specifies that the normal distribution is required. Some useful options are:
INVERSE

Computes critical values of the distribution. The data in the var variable must be entered in probabilities. Specifies the value of the population mean. For the normal distribution the default is MEAN=0. Specifies the value of the population variance. For the normal distribution the default is VAR=1.

MEAN= VAR=

Options that save results


CDF=

Saves the values of the cumulative distribution function in the variable specified.

CRITICAL= Saves the critical values in the variable specified when the INVERSE option is used.

Example This example is from an exercise in Newbold [1995, Exercise 19, Chapter 5]. Suppose that anticipated consumer demand for a product next month can be represented by a normal random variable with mean 1200 units and standard deviation 100 units. Denote the random variable by X. For some value x, the CDF is P(X<x)=F(x). The random variable Z=(X-1200)/100 has a standard normal distribution (mean 0 and variance 1).
Questions and Solutions

(a) What is the probability that sales will exceed 1000 units ? We need to find:
P(X > 1000) = 1 - P(X < 1000) = 1 - F(1000)

Note that:
P(X < 1000) = P(Z < (1000-1200/100) ) = P(Z < -2)

(b) What is the probability that sales will be between 1100 and 1300 units ? We need to find:
P(1100 < X < 1300) = P(X < 1300) - P(X < 1100) = F(1300) - F(1100)

Note that:
30

P(X < 1100) = P(Z < (1100-1200/100) ) = P(Z < -1) P(X < 1300) = P(Z < (1300-1200/100) ) = P(Z < 1)

and

(c) The probability is 0.10 that sales will be more than how many units ? We need to find a value b such that:
P(X > b) = .10

The SHAZAM commands (filename: NPROB.SHA) that follow show how to compute the answers. Note that the GEN1 command is used for scalar arithmetic.
SAMPLE 1 1 * Part (a) GEN1 ANSWER = 1 - NCDF((1000-1200)/100) PRINT ANSWER * Part (b) GEN1 ANSWER = NCDF((1300-1200)/100) - NCDF((1100-1200)/100) PRINT ANSWER * Part (c) * Compute the variance of X GEN1 SIG2=100**2 GEN1 PROB=.1 DISTRIB PROB / TYPE=NORMAL MEAN=1200 VAR=SIG2 INVERSE CRITICAL=ANSWER PRINT ANSWER STOP

The SHAZAM output can be viewed. The answers to the questions are: (a) P(X > 1000) = 0.977 (b) P(1100 < X < 1300) = 0.683 (c) P(X > 1328) = .10

[SHAZAM Guide home]

SHAZAM output
|_SAMPLE 1 1 |_* Part (a) |_GEN1 ANSWER = 1 - NCDF((1000-1200)/100) |_PRINT ANSWER ANSWER .9772499 |_* Part (b) |_GEN1 ANSWER = NCDF((1300-1200)/100) - NCDF((1100-1200)/100)

31

|_PRINT ANSWER ANSWER .6826895 |_* Part (c) |_* Compute the variance of X |_GEN1 SIG2=100**2 |_GEN1 PROB=.1 |_DISTRIB PROB / TYPE=NORMAL MEAN=1200 VAR=SIG2 INVERSE CRITICAL=ANSWER NORMAL DISTRIBUTION - MEAN= 1200.0 VARIANCE= 10000. PROBABILITY CRITICAL VALUE PDF PROB ROW 1 .10000 1328.2 .17550E-02 |_PRINT ANSWER ANSWER 1328.155 |_STOP

[SHAZAM Guide home]

Simulation Experiments

Probability and Expected Value - A Dice Toss Experiment Unbiased Estimators and their Sampling Distribution Measuring the Power of a Test

Probability and Expected Value - A Dice Toss Experiment


Suppose the random variable X is the number resulting from the toss of a fair die. The probability function is: PX(x) = P(X=x) = 1/6 for x = 1,2,3,4,5,6 Let the random variable Y be the number resulting from the toss of a second die. The sum of the two faces is Z=X+Y. The probability function of Z is:
z P(z) 2 3 4 5 6 7 8 9 10 11 12 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

The expected value of Z is: E(Z) = z P(z) = (2)(1/36) + (3)(2/36) + ... + (12)(1/36) =7 The probability function can be derived empirically by using relative frequencies to estimate probabilities. That is, if two dice are tossed a large number of times, the relative frequencies for each possible outcome should be close to the theoretical probabilities given above. The average over all dice tosses of the sum of the two dice faces should be close to the expected value. That is, expectation has the interpretation as the average value of a random variable over a large number of trials. The dice toss experiment can be simulated with a computer program. A random number generator is used to simulate the repeated tosses of two dice. Relative frequencies and summary statistics are then calculated. This is shown with the SHAZAM commands:

* Set the number of tosses GEN1 N=500

32

SAMPLE 1 N * Toss 2 dice GENR x=INT(UNI(6)) + 1 GENR y=INT(UNI(6)) + 1 STAT x y / PFREQ * Calculate the sum GENR sum=x+y STAT sum / PFREQ STOP

On the GENR command the UNI(b) function is used to generate a uniform random number x such that 0 < x < b. The INT(a) function returns the integer part of a. Therefore, the function INT(UNI(6)) will generate a number that is one of 0, 1, 2, 3, 4 or 5 (each is equally likely). In the above commands the number of trials is set to 500. This choice is arbitrary. An increase in the number of trials will require more computing time. But with high speed personal computers this may not be a concern. A choice of 10,000 or 20,000 for the number of trials will give greater accuracy. The SHAZAM output can be viewed. Note that, since the numerical results depend on random numbers, different runs of the program will give different answers. To obtain the same random numbers in different runs the command SET RANFIX should be placed at the top of the SHAZAM commands. The figures below give plots of the probability functions for X and Z.

[SHAZAM Guide home] SHAZAM output - Dice Toss Experiment


33

|_* Set the number of tosses |_GEN1 N=500 |_SAMPLE 1 N |_* Toss 2 dice |_GENR x=INT(UNI(6)) + 1 |_GENR y=INT(UNI(6)) + 1 |_STAT x y / PFREQ NAME N MEAN X 500 3.4720 Y 500 3.5100 VARIABLE = X VALUE 1.0000000 2.0000000 3.0000000 4.0000000 5.0000000 6.0000000 MEDIAN = 3.0000 LOWER 25%= 2.0000 MODE = 3.0000 VARIABLE = Y VALUE 1.0000000 2.0000000 3.0000000 4.0000000 5.0000000 6.0000000 MEDIAN = 3.0000 LOWER 25%= 2.0000 MODE = 5.0000

ST. DEV 1.7411 1.6827

VARIANCE 3.0313 2.8316

MINIMUM 1.0000 1.0000

MAXIMUM 6.0000 6.0000

FREQUENCY 87 86 90 66 83 88

PERCENT 0.17400 0.17200 0.18000 0.13200 0.16600 0.17600

CUMULATIVE 0.17400 0.34600 0.52600 0.65800 0.82400 1.00000

UPPER 25%= 5.0000 INTERQUARTILE RANGE= WITH 90 OBSERVATIONS

3.000

FREQUENCY 80 80 92 78 93 77

PERCENT 0.16000 0.16000 0.18400 0.15600 0.18600 0.15400

CUMULATIVE 0.16000 0.32000 0.50400 0.66000 0.84600 1.00000

UPPER 25%= 5.0000 INTERQUARTILE RANGE= WITH 93 OBSERVATIONS

3.000

|_* Calculate the sum |_GENR sum=x+y |_STAT sum / PFREQ NAME N MEAN SUM 500 6.9820 VARIABLE = SUM VALUE 2.0000000 3.0000000 4.0000000 5.0000000 6.0000000 7.0000000 8.0000000 9.0000000

ST. DEV 2.3445

VARIANCE 5.4967

MINIMUM 2.0000

MAXIMUM 12.000

FREQUENCY 14 28 48 39 67 90 83 54

PERCENT 0.02800 0.05600 0.09600 0.07800 0.13400 0.18000 0.16600 0.10800

CUMULATIVE 0.02800 0.08400 0.18000 0.25800 0.39200 0.57200 0.73800 0.84600

34

10.000000 11.000000 12.000000 MEDIAN = 7.0000 LOWER 25%= 5.0000 MODE = 7.0000 |_STOP

46 22 9

0.09200 0.04400 0.01800

0.93800 0.98200 1.00000

UPPER 25%= 9.0000 INTERQUARTILE RANGE= WITH 90 OBSERVATIONS

4.000

Unbiased Estimators and their Sampling Distribution

Consider the random variables X1, X2, ..., Xn as a random sample from a population with mean . The average value of these observations is the sample mean. The sample mean is a random variable that is an estimator of the population mean. The expected value of the sample mean is equal to the population mean . Therefore, the sample mean is an unbiased estimator of the population mean. How does this work in practice ? Suppose that a data set is collected with n numerical observations x1, x2, ..., xn. A numerical estimate of the population mean can be calculated. Since only a sample of observations is available, the estimate of the mean can be either less than or greater than the true population mean. If the sampling experiment was repeated a second time then a different set of numerical observations would be obtained. Therefore, the estimate of the population mean would be different from the estimate calculated from the first sample. However, the average of the estimates calculated over many repetitions of the sampling experiment will equal the true population mean. This can be illustrated with a computer simulation. Suppose that a sample of 8 observations is drawn from a population that has a uniform distribution on the interval [0,4]. That is, the population mean is 2. A computer program is used to generate 1000 different samples of 8 observations. An estimate of the mean is calculated for each sample. The results for the first 50 trials are shown below.
---------------- Sample Observations --------------x1 x2 x3 x4 x5 x6 x7 x8 0.884 3.816 0.663 0.412 0.523 3.934 3.425 0.553 2.033 0.538 2.475 3.411 3.647 2.608 3.875 3.183 1.083 0.111 0.804 3.485 1.739 3.021 2.601 2.469 2.579 1.017 0.362 3.455 1.312 0.280 0.906 0.295 2.733 3.816 3.824 3.573 2.394 2.991 2.409 3.264 0.376 0.346 2.247 0.884 2.836 1.334 2.225 2.217 1.753 2.217 3.492 3.006 1.260 2.859 1.230 2.888 3.522 2.792 3.360 1.069 3.301 2.549 2.380 2.586 1.260 2.152 3.699 0.789 1.385 0.671 2.093 3.050 0.214 3.345 2.085 0.273 1.415 0.907 2.292 3.080 1.739 2.483 2.189 2.321 1.047 3.794 0.627 1.010 2.785 1.282 0.619 2.932 2.336 0.789 0.405 1.341 0.030 0.744 2.034 2.262 1.024 1.496 2.262 1.290 0.111 2.446 2.903 1.650 2.615 2.431 0.361 1.540 3.521 1.856 1.024 0.832 1.724 1.142 2.578 0.973 2.917 2.954 3.839 3.183 3.699 3.801 2.748 2.579 1.106 2.225 2.984 2.520 1.828 3.596 3.316 3.854 3.853 3.588 0.848 0.664 3.176 1.761 1.717 2.314 3.603 0.804 3.714 2.218 2.734 1.423 1.431 1.188 3.317 3.943 2.167 1.791 2.801 2.535 1.666 1.828 2.263 3.508 2.079 2.602 2.072 0.532 0.805 0.068 3.273 1.122 0.989 0.841 3.972 3.162 3.449 2.536 Sample Mean 1.776 2.721 1.914 1.276 3.126 1.558 2.338 2.695 1.887 1.701 1.901 1.561 1.393 1.757 1.706 3.215 2.678 2.240 2.139 2.506 1.741 2.418

Trial 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

35

23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

1.482 2.050 0.260 3.604 0.430 2.500 1.564 0.268 3.987 0.378 1.292 0.327 3.752 1.506 0.799 3.952 0.187 2.912 0.703 3.105 1.654 3.511 2.339 1.904 2.855 3.599 3.356 2.634

2.469 3.346 1.233 1.483 1.866 1.004 2.890 1.873 2.455 2.757 2.080 2.979 0.040 1.749 1.108 2.773 3.996 1.690 1.845 0.446 2.494 2.258 3.503 0.122 2.111 3.937 0.387 3.864

0.628 1.328 1.380 2.743 3.546 0.356 1.741 1.343 1.962 2.883 0.290 3.200 0.290 2.198 0.990 3.819 0.173 1.602 3.466 1.705 2.759 3.356 1.919 0.262 2.796 3.525 2.472 2.162

1.541 2.691 3.538 2.426 0.651 0.231 0.886 0.120 1.431 2.956 2.934 3.885 2.051 3.200 0.799 1.336 2.876 2.927 0.504 1.779 2.663 2.604 3.820 1.190 1.403 2.177 0.144 0.844

0.142 1.586 3.288 1.630 2.684 0.415 3.641 3.184 1.048 2.905 0.695 3.656 2.987 0.998 2.979 0.011 2.309 0.939 3.370 3.599 1.787 0.564 0.004 0.387 2.862 0.269 1.757 3.356

1.401 3.236 2.949 0.186 2.625 3.899 0.363 0.238 0.827 1.174 1.373 3.929 1.543 2.294 1.336 0.578 3.885 3.244 3.370 2.339 3.223 0.549 0.203 1.713 1.728 1.175 0.277 0.070

3.346 0.503 0.260 1.336 1.078 1.534 2.433 0.216 0.009 2.013 0.621 2.743 2.950 2.147 2.721 0.025 0.813 3.871 1.374 0.976 1.035 1.175 2.803 2.560 2.435 2.994 3.901 3.805

1.512 0.260 1.807 3.163 0.304 3.501 0.989 2.897 0.805 2.595 1.874 3.848 0.084 3.856 1.639 3.171 3.686 3.650 1.028 0.342 1.448 3.533 2.899 2.052 1.971 1.926 1.617 2.354

1.565 1.875 1.839 2.071 1.648 1.680 1.814 1.267 1.566 2.208 1.395 3.071 1.712 2.243 1.546 1.958 2.241 2.604 1.957 1.786 2.133 2.194 2.186 1.274 2.270 2.450 1.739 2.386

By viewing the final column that lists the estimates of the mean it can be seen that some estimates are less than the population mean of 2 and some estimates are greater than 2. A total of 1000 estimates was calculated and the average was obtained as:
2.00780

The closeness of the average to 2 (the true population mean) reflects that the estimates are generated from an unbiased estimation procedure. The sampling distribution of an estimator is the distribution of the estimator in all possible samples of the same size drawn from the population. For the sample mean, the central limit theorem gives the result that the sampling distribution of the sample mean will tend to the normal distribution. To see this result, the 1000 estimates of the mean were sorted into a number of groups. The numbers of observations in each group are displayed in the histogram below.

36

The above histogram is centered at 2 (the value of the population mean) and the shape conforms to the shape of a normal distribution.

SHAZAM command file The SHAZAM commands for the above demonstration are as follows.
SAMPLE 1 8 GEN1 NREP=1000 * Repeated sampling of observations from a uniform distribution * with sample size 8 DIM SAMPMEAN NREP SET NODOECHO NOOUTPUT RANFIX DO #=1,NREP * Generate the sample GENR X=UNI(4) * Calculate the sample mean STAT X / MEAN=MEAN * Save the results MATRIX I=$DO MATRIX RESULTS=(I|X'|MEAN) FORMAT(1X,F5.0,8F7.3,3X,F7.3) IF (I.LE.50) PRINT RESULTS / FORMAT NONAMES GEN1 SAMPMEAN:#=MEAN ENDO * Get the average from all the replications SET OUTPUT SAMPLE 1 NREP STAT SAMPMEAN / MEAN=MEAN PRINT MEAN * Display the sampling distribution with a histogram GRAPH SAMPMEAN / HISTO GROUPS=10 RANGE STOP

Measuring the Power of a Test

37

Suppose a sample of n observations is available from a normal population with mean and known variance. Consider testing the null hypothesis: H0: = 5 against the 2-sided alternative H1: 5

The power of any test will depend on:


the true population mean the sample size the significance level the population variance

The above ideas can be demonstrated with an example. To compute a power function for the test a series of values for the true population mean is generated. Values are set for the sample size, the population standard deviation and the significance level. The power is then calculated for each value of the mean. The SHAZAM commands (filename: POWER.SHA) that will do the calculations are below.
* --------- SHAZAM procedure for computing the power of a test ------PROC PFUNC * Test H0: mean = 5 versus H1: mean not equal to 5 * Input requirements * SDEV: standard deviation * NOBS: number of observations * ZVAL: the critical value GEN1 SD=[SDEV] GEN1 N=[NOBS] GEN1 ZCRIT=[ZVAL] * compute the upper and lower bounds of the acceptance region GEN1 STDERR=SD/SQRT(N) GEN1 XLOW=5-ZCRIT*STDERR GEN1 XUP=5+ZCRIT*STDERR * Try 20 different values of the population mean SAMPLE 1 20 GENR MEAN=5+TIME(-10)/100 * Find the probability of a Type II error GENR ZLOW=(XLOW-MEAN)/STDERR GENR ZUP=(XUP-MEAN)/STDERR GENR BETA=NCDF(ZUP)-NCDF(ZLOW) * Compute the power GENR POWER=1-BETA PROCEND * ------------------------------------------------------------------------SAMPLE 1 20 * sd = 0.1; n=16 and alpha = 0.05 SDEV: 0.1 NOBS: 16 ZVAL: 1.96 EXEC PFUNC GENR POWER1=POWER * Increase the sample size * sd = 0.1; n=100 and alpha = 0.05 SDEV: 0.1 NOBS: 100

38

ZVAL: 1.96 EXEC PFUNC GENR POWER2=POWER * Decrease the significance level * sd = 0.1; n=16 and alpha = 0.01 SDEV: 0.1 NOBS: 16 ZVAL: 2.576 EXEC PFUNC GENR POWER3=POWER * Increase the variance * sd = 0.2; n=16 and alpha = 0.05 SDEV: 0.2 NOBS: 16 ZVAL: 1.96 EXEC PFUNC GENR POWER4=POWER * Print the results PRINT MEAN POWER1-POWER4 STOP

Figure 1 shows a graph of the power function for a sample size of n=16 and n=100. The population standard deviation is 0.1 and the significance level is 0.05. The figure illustrates that an increase in sample size leads to greater power. The figure also shows that the farther the true mean from the hypothesized value of 5 the greater the power of the test. When the true population mean is 5 the probability that we reject the null hypothesis is 0.05, the significance level of the test. Figure 1

Figure 2 shows a graph of the power function for significance levels of 0.05 and 0.01. The sample size is n=16 and the population standard deviation is 0.1. The figure illustrates that a smaller significance level gives lower power. Figure 2

39

Figure 3 shows a graph of the power function for standard deviations of 0.1 and 0.2. The sample size is n=16 and the significance level is 0.05. The figure illustrates that a larger variance gives lower power. Figure 3

More Data Analysis


Calculating a Real Interest Rate Converting Time Series Data to Different Periodicities Calculating a Geometric Mean

Calculating a Real Interest Rate


A method for calculating a real interest rate is described in the Appendix of the paper: Russell Davidson and James G. MacKinnon (1985), "Testing Linear and Log-linear Regressions against Box-Cox Alternatives", Canadian Journal of Economics, Vol. 18, pp. 499-517. Suppose the nominal rate of interest is I and the expected rate of inflation is PE. The real rate of interest (R) is:
40

R = I - PE An estimate of the expected rate of inflation is required. Suppose that D is a personal expenditure deflator (expressed as 1 in the base period). The inflation rate between periods t and t-1 can be estimated as: Pt = log(Dt) - log(Dt-1) Davidson and MacKinnon propose that a rational predictor of the inflation rate can be obtained as the predicted values from the regression: Pt = 1 + 2 Zt + 3 t + et where et is a random error and Zt = 0.2 Pt-1 + 0.3 Pt-2 + 0.3 Pt-3 + 0.2 Pt-4 That is, Zt is a weighted moving average of lagged values of the inflation rate. The weights are somewhat arbitrary and they sum to 1. Example Quarterly data for Canada on the banks' prime lending rate and the personal expenditure deflator were collected from the CANSIM Statistics Canada data base for the period 1961Q1 to 1996Q3. The data file can be viewed. The SHAZAM commands (filename: RATES.SHA) for computing the real interest rate are given below.

TIME 1961 4 SAMPLE 1961.1 1996.3 * Read the CANSIM data READ (RATES.txt) DATE PDEFL RATE * Compute an inflation rate GENR PDEFL=PDEFL/100 GENR INFL=LOG(PDEFL) - LOG(LAG(PDEFL)) * Compute a weighted moving average of past inflation rates GENR INFL4=0.2*LAG(INFL) + 0.3*LAG(INFL,2) + 0.3*LAG(INFL,3) + 0.2*LAG(INFL,4) * Because of the lags, there are 5 undefined observations at the * beginning of the sample. * Adjust the sample period to exclude the undefined observations. SAMPLE 1962.2 1996.3 * Generate a time trend GENR TREND=TIME(0) * Get an expected inflation rate as the predicted values * from an OLS regression. OLS INFL INFL4 TREND / PREDICT=EINFL * Compute a real interest rate. * The factor of 400 is needed to convert quarterly rates to * annual percentage rates. GENR RR = RATE - 400*EINFL * Write the results to a data file. SAMPLE 1963.1 1996.3 FORMAT(F10.1,2F8.2) WRITE (RR.txt) DATE RATE RR / FORMAT STOP

The SHAZAM output can be viewed. The calculated real interest rate is written to the data file RR.txt. This data file can then be used in a SHAZAM program by loading the data with the command:

READ (RR.txt) DATE RATE RR / LIST

41

The LIST option on the READ command gives a complete listing of the data on the SHAZAM output file. A comparison of the nominal and real interest rates for Canada is shown in the figure below.

[SHAZAM Guide home] SHAZAM output for computing real interest rates
|_TIME 1961 4 |_SAMPLE 1961.1 1996.3 |_* Read the CANSIM data |_READ (RATES.txt) DATE PDEFL RATE UNIT 88 IS NOW ASSIGNED TO: RATES.txt 3 VARIABLES AND 143 OBSERVATIONS STARTING AT OBS

|_* Compute an inflation rate |_GENR PDEFL=PDEFL/100 |_GENR INFL=LOG(PDEFL) - LOG(LAG(PDEFL)) ..NOTE.LAG VALUE IN UNDEFINED OBSERVATIONS SET TO ZERO ...WARNING...ILLEGAL LOG IN OBS. 1, VALUE REPLACED BY ZERO |_* Compute a weighted moving average of past inflation rates |_GENR INFL4=0.2*LAG(INFL) + 0.3*LAG(INFL,2) + 0.3*LAG(INFL,3) + 0.2*LAG(INFL,4) ..NOTE.LAG VALUE IN UNDEFINED OBSERVATIONS SET TO ZERO ..NOTE.LAG VALUE IN UNDEFINED OBSERVATIONS SET TO ZERO ..NOTE.LAG VALUE IN UNDEFINED OBSERVATIONS SET TO ZERO ..NOTE.LAG VALUE IN UNDEFINED OBSERVATIONS SET TO ZERO |_* Because of the lags, there are 5 undefined observations at the |_* beginning of the sample. |_* Adjust the sample period to exclude the undefined observations. |_SAMPLE 1962.2 1996.3 |_* Generate a time trend |_GENR TREND=TIME(0) |_* Get an expected inflation rate as the predicted values |_* from an OLS regression. |_OLS INFL INFL4 TREND / PREDICT=EINFL

.00000

42

OLS ESTIMATION 138 OBSERVATIONS DEPENDENT VARIABLE = INFL ...NOTE..SAMPLE RANGE SET TO: 6, 143 R-SQUARE = .6650 R-SQUARE ADJUSTED = .6600 VARIANCE OF THE ESTIMATE-SIGMA**2 = .20972E-04 STANDARD ERROR OF THE ESTIMATE-SIGMA = .45795E-02 SUM OF SQUARED ERRORS-SSE= .28312E-02 MEAN OF DEPENDENT VARIABLE = .11885E-01 LOG OF THE LIKELIHOOD FUNCTION = 548.994 VARIABLE ESTIMATED STANDARD T-RATIO ELASTICITY NAME COEFFICIENT ERROR 135 DF MEANS INFL4 .87754 .5416E-01 16.20 .8755 TREND -.14247E-04 .9800E-05 -1.454 .0893 CONSTANT .25416E-02 .1072E-02 2.372 .2139 PARTIAL STANDARDIZED P-VALUE CORR. COEFFICIENT .000 .813 .8083 -.0725 .0000 AT

.148 -.124 .019 .200

|_* Compute a real interest rate. |_* The factor of 400 is needed to convert quarterly rates to |_* annual percentage rates. |_GENR RR = RATE - 400*EINFL |_* Write the results to a data file. |_SAMPLE 1963.1 1996.3 |_FORMAT(F10.1,2F8.2) |_WRITE (RR.txt) DATE RATE RR / FORMAT UNIT 88 IS NOW ASSIGNED TO: RR.txt |_STOP

Converting Time Series Data to Different Periodicities


Statistical agencies may report time series data on a monthly basis. This data can be converted to quarterly data or annual data by using a suitable conversion method. For example, a monthly price index can be converted to a quarterly price index by computing 3-month averages. Monthly data on the volume of retail sales can be converted to quarterly or annual data by summation. With SHAZAM, the SUM function on the GENR command can be used to convert time series data to different periodicities. The command format is:

GENR newvar=SUM(var,n)

The SUM function sums up n successive observations on the variable var starting at observation 1. Example Suppose monthly data for variable VARM is available for the sample period 1986M1 to 1996M12 (132 monthly observations). The SHAZAM commands below convert the data to quarterly data by using 3-month averages.

SAMPLE 1 132 READ (file) DATEM VARM

43

* Set the sample to the number of quarterly observations SAMPLE 1 44 * Convert to quarterly data by computing monthly averages GENR VARQ=SUM(VARM,3)/3 * Generate a quarterly date variable starting in 1986Q1 TIME 1986 4 DATEQ * Write the results to a data file WRITE (newfile) DATEQ VARQ STOP

The WRITE command is used to save the new data to a data file.

Calculating a Geometric Mean


Consider a sample of positive numbers x1, x2, ... xn. The geometric mean is: Gn = ( in=1 xi ) 1/n Let An be the arithmetic mean. A result from calculus is: Gn An Gn = An if and only if x1 = x2 = ... = xn. The logarithm of the geometric mean is the arithmetic mean of the log transformed data: log(Gn) = in=1 log(xi) / n The geometric mean is an appropriate measure of central tendency when averages of rates or index numbers are required. For example, suppose that, in three successive years the return on an investment is 5%, 20% and -4%. The average rate of return can be found as the geometric mean:

((1.05)(1.20)(0.96))1/3 = 1.065
Therefore, the average rate of return (compound annual growth rate) is 6.5%. Example This example is from J. Freund, F. Williams, B. Perles and C. Sullivan [Modern Business Statistics, Prentice-Hall, 1969, Chapter 16]. Price indexes are available for seven food commodities. The indexes express the 1968 price as a percentage of the 1956 price. The SHAZAM commands (filename: GMEAN.SHA) that compute and compare the arithmetic mean with the geometric mean are given below. The geometric mean is calculated from the arithmetic mean of the log transformed data.

SAMPLE 1 7 READ PRICE / BYVAR LIST 137 146 163 98 144 292 119 * Calculate the arithmetic mean and median STAT PRICE / MEAN=AN PMEDIAN * Calculate the geometric mean GENR LPRICE=LOG(PRICE) STAT LPRICE / MEAN=LGN GEN1 GN=EXP(LGN) PRINT AN GN STOP

The SHAZAM output can be viewed. The output shows the following results: Arithmetic Mean Median Geometric Mean 157 144 149

44

[SHAZAM Guide home] SHAZAM output


|_SAMPLE 1 7 |_READ PRICE / BYVAR LIST 1 VARIABLES AND 7 OBSERVATIONS STARTING AT OBS PRICE 137.0000 292.0000

146.0000 119.0000

163.0000

98.00000

144.0000

|_* Calculate the arithmetic mean and median |_STAT PRICE / MEAN=AN PMEDIAN NAME N MEAN ST. DEV VARIANCE PRICE 7 157.00 63.082 3979.3 VARIABLE = PRICE MEDIAN = 144.00 LOWER 25%= 119.00 MODE NOT APPLICABLE

MINIMUM 98.000

MAXIMUM 292.00

UPPER 25%=

163.00

INTERQUARTILE RANGE=

44.00

|_* Calculate the geometric mean |_GENR LPRICE=LOG(PRICE) |_STAT LPRICE / MEAN=LGN NAME N MEAN ST. DEV LPRICE 7 5.0011 0.34044 |_GEN1 GN=EXP(LGN) |_PRINT AN GN AN 157.0000 GN 148.5828 |_STOP

VARIANCE 0.11590

MINIMUM 4.5850

MAXIMUM 5.6768

PART II - Practicing Econometrics


Ordinary Least Squares Ordinary Least Squares Regression Comparing linear vs. log-linear models Confidence Intervals Hypothesis Testing Estimation with restrictions Prediction

Ordinary Least Squares Regression

The OLS command will estimate the parameters of a linear regression equation by the method of ordinary least squares. The general command format is:
45

OLS depvar indeps / options

where depvar is the dependent variable, indeps is a list of the explanatory variables and options is a list of desired options. There are many useful options on the OLS command and some of these will be illustrated in this guide. Examples

2-variable Regression Analysis o Interpreting t-ratios o Interpreting p-values o Interpreting elasticities The General Linear Statistical Model - Multiple Regression Analysis

Appendixes

Computing p-values for test statistics A Monte Carlo experiment to demonstrate the properties of the OLS estimator

[SHAZAM Guide home]

2-variable Regression Analysis

This example uses the Griffiths, Hill and Judge data set on household expenditure for food. Consider a simple linear regression with FOOD as the dependent variable and INCOME as the explanatory variable. The following SHAZAM program reads the data from the file GHJ.txt, assigns variable names and runs the regression. Note that the READ command assumes that the data file is in the current directory (or folder).
SAMPLE 1 40 READ (GHJ.txt) FOOD INCOME OLS FOOD INCOME STOP

The output file of results follows.


|_SAMPLE 1 40 |_READ (GHJ.txt) FOOD INCOME UNIT 88 IS NOW ASSIGNED TO: GHJ.txt 2 VARIABLES AND 40 OBSERVATIONS STARTING AT OBS |_OLS FOOD INCOME OLS ESTIMATION 40 OBSERVATIONS DEPENDENT VARIABLE = FOOD ...NOTE..SAMPLE RANGE SET TO: 1, 40 R-SQUARE = .3171 R-SQUARE ADJUSTED = .2991 1

46

VARIANCE OF THE ESTIMATE-SIGMA**2 = 46.853 STANDARD ERROR OF THE ESTIMATE-SIGMA = 6.8449 SUM OF SQUARED ERRORS-SSE= 1780.4 MEAN OF DEPENDENT VARIABLE = 23.595 LOG OF THE LIKELIHOOD FUNCTION = -132.672 VARIABLE NAME INCOME CONSTANT |_STOP ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 38 DF .23225 .5529E-01 4.200 7.3832 4.008 1.842 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .563 .5631 .6871 .073 .286 .0000 .3129

SHAZAM automatically includes an intercept coefficient in the regression and this is given the name CONSTANT. On the SHAZAM output, the intercept estimate is listed as the final coefficient estimate. The results show that the estimated coefficient on INCOME (the slope coefficient) is 0.23225 and the intercept estimate is 7.3832. The estimated equation can be written as:
FOOD = 7.38 + 0.232 INCOME +

where is the estimated residual. The figure below shows a scatterplot of the observations and the estimated regression line. (This figure corresponds to Figure 5.9 of Griffiths, Hill and Judge [1993, p. 187]).

The LIST option on the OLS command will give more extensive output that includes a listing of the estimated residuals and the predicted values for the dependent variable. The use of the LIST option is shown with the SHAZAM command:
OLS FOOD INCOME / LIST

The interested reader can look at the SHAZAM output generated with the LIST option. Interpreting t-ratios The OLS estimation results report the ESTIMATED COEFFICIENT and the estimated STANDARD ERROR. With the assumption that the errors are normally distributed these estimates can be used for hypothesis testing purposes. In the above example, a useful question to ask is: Is the estimated coefficient on INCOME significantly different from zero ? That is, does household income have an effect on the level
47

of household expenditure for food ? To help answer this question the SHAZAM output reports the test statistic:
T-RATIO = ESTIMATED COEFFICIENT / STANDARD ERROR

The estimated coefficient is significantly different from zero (that is, the null hypothesis of a zero coefficient is rejected) if the t-ratio is "relatively large". The critical value is obtained from tables for the t-distribution with N-K degrees of freedom (N is the number of observations and K is the number of estimated coefficients). These tables are usually printed in the appendix to econometrics textbooks. For the household food expenditure example the reported t-ratio for the coefficient on INCOME is 4.20. The number of observations is 40 and the number of estimated coefficients is 2 and so the degrees of freedom (DF) is 38. By choosing a signficance level of 5% and considering a two-sided test (so that the critical region in each tail is 2.5%) the critical value obtained from printed tables is 2.024. (Note that this critical value was approximated using the tabulated values for 30 and 40 degrees of freedom that are reported in the tables.) In absolute value, the t-ratio exceeds this critical value. Therefore, there is strong evidence to conclude that the estimated coefficient on INCOME is significantly different from zero. Interpreting p-values When interpreting t-ratios it can be inconvenient to consult statistical tables. To assist the user, SHAZAM reports the P-VALUE on the OLS estimation output. This value is computed as the tail probability for a two-tail test of the null hypothesis that the coefficient is 0. This is the probability of a Type I error - the probability of rejecting a true hypothesis. The null hypothesis is rejected if the p-value is "small" (say smaller than 0.10, 0.05 or 0.01). For example, if the p-value is 0.078, this means that the null hypothesis cannot be rejected at a 5% significance level but can be rejected at a 10% significance level. Note: SHAZAM only reports three decimal places for the p-value. So a value that is reported as .000 actually means a value less than .0005. This can be interpreted as meaning that the null hypothesis of a zero coefficient is rejected at any reasonable significance level. It is possible to use SHAZAM commands to compute p-values for test statistics. Interpreting elasticities For the household food expenditure relationship the estimated coefficient on INCOME measures the marginal effect. This gives the amount by which FOOD changes in response to a one unit change in INCOME. Another measure of interest to economists is elasticity. This gives the percentage change in the dependent variable that results from a 1% change in the explanatory variable. The final column on the SHAZAM OLS estimation output reports the ELASTICITY AT MEANS. For the example illustrated here, let B1 be the estimated coefficient on INCOME and let CM and PM be the sample means of FOOD and INCOME respectively. The income elasticity evaluated at the sample means is computed as:
B1 (PM/CM) = 0.6871

48

When interpreting the meaning of the estimated coefficients and the elasticities users should take careful note of the units of measurement of the variables in the regression equation.

[Back to Top]

[SHAZAM Guide home]

The LIST option The SHAZAM output that follows shows the use of the LIST option on the OLS command.
|_OLS FOOD INCOME / LIST OLS ESTIMATION 40 OBSERVATIONS DEPENDENT VARIABLE = FOOD ...NOTE..SAMPLE RANGE SET TO: 1, 40 R-SQUARE = .3171 R-SQUARE ADJUSTED = .2991 VARIANCE OF THE ESTIMATE-SIGMA**2 = 46.853 STANDARD ERROR OF THE ESTIMATE-SIGMA = 6.8449 SUM OF SQUARED ERRORS-SSE= 1780.4 MEAN OF DEPENDENT VARIABLE = 23.595 LOG OF THE LIKELIHOOD FUNCTION = -132.672 VARIABLE NAME INCOME CONSTANT OBS. NO. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 38 DF .23225 .5529E-01 4.200 7.3832 4.008 1.842 OBSERVED VALUE 9.4600 10.560 14.810 21.710 22.790 18.190 22.000 18.120 23.130 19.000 19.460 17.830 32.810 22.130 23.460 16.810 21.350 14.870 33.000 25.190 17.770 22.440 22.870 26.520 21.000 37.520 PREDICTED VALUE 13.382 15.352 17.254 18.241 18.599 18.710 18.915 19.446 20.002 20.127 20.496 21.047 21.116 21.488 21.579 22.038 22.703 22.805 23.738 23.752 24.101 24.105 24.159 24.159 24.440 24.628 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .563 .5631 .6871 .073 .286 .0000 .3129

CALCULATED RESIDUAL -3.9223 -4.7918 -2.4440 3.4689 4.1913 -.52021 3.0854 -1.3265 3.1285 -1.1270 -1.0362 -3.2167 11.694 .64204 1.8815 -5.2284 -1.3526 -7.9348 9.2615 1.4376 -6.3308 -1.6655 -1.2889 2.3611 -3.4399 12.892

I I * I I * I * * I * *I I * *I *I * I I * I* * I *I * I I I* * I *I *I I * * I I *

49

27 28 29 30 31 32 33 34 35 36 37 38 39 40

21.690 27.400 30.690 19.560 30.580 41.120 15.380 17.870 25.540 39.000 20.440 30.100 20.900 48.710

24.749 25.111 26.200 26.393 26.558 26.737 26.753 28.706 28.706 28.973 29.487 30.934 33.890 34.199

-3.0588 2.2889 4.4896 -6.8332 4.0219 14.383 -11.373 -10.836 -3.1664 10.027 -9.0468 -.83371 -12.990 14.511

* * * * *

* I I * I * I I * I I I * I I I *I I I

DURBIN-WATSON = 2.3703 VON NEUMANN RATIO = 2.4310 RHO = -.28193 RESIDUAL SUM = -.36060E-12 RESIDUAL VARIANCE = 46.853 SUM OF ABSOLUTE ERRORS= 207.53 R-SQUARE BETWEEN OBSERVED AND PREDICTED = .3171 RUNS TEST: 22 RUNS, 17 POS, 0 ZERO, 23 NEG NORMAL STATISTIC = |_STOP

.4755

The LIST option displays a table of results that contains the following:
OBSERVED VALUE PREDICTED VALUE

The observed value of the dependent variable. The predicted value (also called estimated value or fitted value) of the dependent variable.

CALCULATED RESIDUAL The difference between the observed and predicted values.

The right hand side of the output displays a rough plot of the residuals. A property of ordinary least squares regression (when an intercept is included) is that the sum of the estimated residuals (and hence the mean of the estimated residuals) is 0. Note that the final part of the SHAZAM output reports:
RESIDUAL SUM = -.36060E-12

That is, SHAZAM computes the sum of residuals as .00000000000036060. This shows that computer calculations can have some imprecision. Different computers may have numerical differences in the reporting of this result.

Comparing linear vs. log-linear models


An equation that specifies a linear relationship among the variables gives an approximate description of some economic behaviour. An alternative approach is to consider a linear relationship among logtransformed variables. This is a log-log model - the dependent variable as well as all explanatory variables are transformed to logarithms. Since the relationship among the log variables is linear some researchers call this a log-linear model.
50

Different functional forms give parameter estimates that have different economic interpretation. The parameters of the linear model have an interpretation as marginal effects. The elasticities will vary depending on the data. In contrast the parameters of the log-log model have an interpretation as elasticities. So the log-log model assumes a constant elasticity over all values of the data set. The log transformation is only applicable when all the observations in the data set are positive. Gujarati [Basic Econometrics, Third Edition, 1995, McGraw-Hill, p.387] notes that this can be guaranteed by using a transformation like log(X+k) where k is a positive scalar chosen to ensure positive values. However, users will then need to give careful thought to the interpretation of the parameter estimates. For a given data set there may be no particular reason to assume that one functional form is better than the other. A model selection approach is to estimate competing models by OLS and choose the model with the highest R-square. SHAZAM computes the R-square as: R2 = 1 - SSE / SST where SSE is the sum of squared estimated residuals and SST is the sum of squared deviations from the mean of the dependent variable. An equivalent computation is to compute the squared coefficient of correlation between the observed and predicted values of the dependent variable. (It may be useful to verify this as an exercise.) An R-square comparison is meaningful only if the dependent variable is the same for both models. So the R-square from the linear model cannot be compared with the R-square from the log-log model. That is, the R-square measure gives the proportion of variation in the dependent variable that is explained by the explanatory variables. For the log-log model the R-square gives the amount of variation in ln(Y) that is explained by the model. For comparison purposes we would like a measure that uses the anti-log of ln(Y). For the log-log model, the way to proceed is to obtain the antilog predicted values and compute the Rsquare between the antilog of the observed and predicted values. This R-square can then be compared with the R-square obtained from OLS estimation of the linear model. When estimating a log-log model the following two options can be used on the OLS command.
LOGLOG This option tells SHAZAM that the dependent variable and explanatory variables have been

transformed to logarithms. SHAZAM reports elasticities that are identical to the estimated coefficients. RSTAT This option computes a number of residual statistics. When the LOGLOG option is also specified the SHAZAM output will report the the R-square between the antilog of the observed and predicted values. This can be used for comparison with the R-square obtained from the linear model. Example This example uses the Theil textile data set. The SHAZAM commands (filename: LINLOG.SHA) below estimate a linear demand equation. Log-transformed variables are then generated and the log-log model is estimated by the method of ordinary least squares.
SAMPLE 1 17 READ (THEIL.txt) YEAR CONSUME INCOME PRICE * Obtain parameter estimates for the linear model

51

OLS CONSUME INCOME PRICE / PREDICT=YHAT1 * Use the GENR command to get logarithms of the variables GENR LC=LOG(CONSUME) GENR LINC=LOG(INCOME) GENR LP=LOG(PRICE) * Obtain parameter estimates for the log-log model OLS LC LINC LP / RSTAT LOGLOG PREDICT=YHAT2 * Obtain the antilog predicted values (include a bias adjustment) GENR YHAT2=EXP(YHAT2+$SIG2/2) * * Print results PRINT YEAR CONSUME YHAT1 YHAT2 STOP

Note that on the OLS estimation commands the PREDICT= option is used to save the predicted values in the variable specified. The predicted values from the linear model are saved in the variable assigned the name YHAT1. The predicted values from the log-log model are saved in the variable named YHAT2. From the log-log model estimation, predictions for CONSUME are constructed by taking antilogs. More details on computing antilog predictions are available. The SHAZAM output can be inspected. The SHAZAM output from the linear model gives the result:
R-SQUARE = .9513

The SHAZAM output from the log-log model gives the result:
R-SQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED = .9689

In this example, the R-square for the log-log model is higher - so there is some evidence to prefer the log-log specification. Users may be interested in more formal procedures for testing between the linear and log-log model specification. Test procedures have been proposed by various researchers. Other functional forms can be considered. The Box-Cox transformation creates a general functional form where both the linear model and log-log model are special cases. Features for estimating this model are described in the chapter on Box-Cox regression in the SHAZAM User's Reference Manual.

[SHAZAM Guide home] Computing antilog predictions In the above example, the log-log model is estimated and the antilog predictions are computed with the commands:
* Obtain parameter estimates for the log-log model OLS LC LINC LP / RSTAT LOGLOG PREDICT=YHAT2 * Obtain the antilog predicted values (include a bias adjustment) GENR YHAT2=EXP(YHAT2+$SIG2/2)

52

When constructing the antilog predictions some consideration should be given to using an unbiased predictor. A result from statistical theory is that if a random variable Y is normally distributed with mean and variance 2 then a random variable Z defined as Z=exp(Y) has mean: exp( +
2

/2)

(See, for example, Mood, Graybill and Boes [1974] and Ramanathan [1995 p.271]). Therefore, it is important to include an estimate of 2/2 in the computation of the antilog predictions. On the OLS estimation output the estimated error variance is reported on the line VARIANCE OF THE ESTIMATE-SIGMA**2. After model estimation this estimate is available in the temporary variable with the special name $SIG2. The GENR command for constructing the antilog predictions includes this in the calculation.

[Back to Top]

[SHAZAM Guide home]

SHAZAM output for the comparison of linear and log-log models


|_SAMPLE 1 17 |_READ (THEIL.txt) YEAR CONSUME INCOME PRICE UNIT 88 IS NOW ASSIGNED TO: THEIL.txt 4 VARIABLES AND 17 OBSERVATIONS STARTING AT OBS |_* Obtain parameter estimates for the linear model |_OLS CONSUME INCOME PRICE / PREDICT=YHAT1 OLS ESTIMATION 17 OBSERVATIONS DEPENDENT VARIABLE = CONSUME ...NOTE..SAMPLE RANGE SET TO: 1, 17 R-SQUARE = .9513 R-SQUARE ADJUSTED = .9443 VARIANCE OF THE ESTIMATE-SIGMA**2 = 30.951 STANDARD ERROR OF THE ESTIMATE-SIGMA = 5.5634 SUM OF SQUARED ERRORS-SSE= 433.31 MEAN OF DEPENDENT VARIABLE = 134.51 LOG OF THE LIKELIHOOD FUNCTION = -51.6471 VARIABLE NAME INCOME PRICE CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 14 DF 1.0617 .2667 3.981 -1.3830 .8381E-01 -16.50 130.71 27.09 4.824 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .001 .729 .2387 .8129 .000 -.975 -.9893 -.7846 .000 .790 .0000 .9718 1

|_* Use the GENR command to get logarithms of the variables |_GENR LC=LOG(CONSUME) |_GENR LINC=LOG(INCOME) |_GENR LP=LOG(PRICE) |_* Obtain parameter estimates for the log-log model |_OLS LC LINC LP / RSTAT LOGLOG PREDICT=YHAT2 OLS ESTIMATION 17 OBSERVATIONS DEPENDENT VARIABLE = LC

53

...NOTE..SAMPLE RANGE SET TO:

1,

17

R-SQUARE = .9744 R-SQUARE ADJUSTED = .9707 VARIANCE OF THE ESTIMATE-SIGMA**2 = .97236E-03 STANDARD ERROR OF THE ESTIMATE-SIGMA = .31183E-01 SUM OF SQUARED ERRORS-SSE= .13613E-01 MEAN OF DEPENDENT VARIABLE = 4.8864 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -46.5862 VARIABLE NAME LINC LP CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 14 DF 1.1432 .1560 7.328 -.82884 .3611E-01 -22.95 3.1636 .7048 4.489 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .891 .3216 1.1432 .000 -.987 -1.0074 -.8288 .001 .768 .0000 3.1636

DURBIN-WATSON = 1.9267 VON NEUMANN RATIO = 2.0471 RHO = -.11385 RESIDUAL SUM = .10769E-13 RESIDUAL VARIANCE = .97236E-03 SUM OF ABSOLUTE ERRORS= .40583 R-SQUARE BETWEEN OBSERVED AND PREDICTED = .9744 R-SQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED = .9689 RUNS TEST: 9 RUNS, 9 POS, 0 ZERO, 8 NEG NORMAL STATISTIC = -.2366 |_* Obtain the antilog predicted |_GENR YHAT2=EXP(YHAT2+$SIG2/2) ..NOTE..CURRENT VALUE OF $SIG2= |_* |_* Print results |_PRINT YEAR CONSUME YHAT1 YHAT2 YEAR CONSUME 1923.000 99.20000 1924.000 99.00000 1925.000 100.0000 1926.000 111.6000 1927.000 122.2000 1928.000 117.6000 1929.000 121.1000 1930.000 136.0000 1931.000 154.2000 1932.000 153.6000 1933.000 158.5000 1934.000 140.6000 1935.000 136.2000 1936.000 168.0000 1937.000 154.3000 1938.000 149.0000 1939.000 165.5000 |_STOP values (include a bias adjustment) .97236E-03

YHAT1 93.69238 96.42346 98.57900 116.7814 122.4517 122.9100 123.0455 135.4254 149.8042 152.0574 153.9054 145.5571 145.0975 161.5844 156.8614 156.2887 156.1350

YHAT2 96.05522 98.37372 100.6381 115.3575 119.8714 122.1649 122.8039 134.3674 149.5499 151.7951 153.9190 140.7879 140.4307 166.7092 158.5688 157.5912 157.5576

[Back to Top]

[SHAZAM Guide home]

Testing the Linear versus Log-log Model Various methods for testing the linear versus log-log model have been proposed. Some discussion is in Maddala [1992, pp.222-3]. A test procedure is described in Griffiths, Hill and Judge [1993, pp.345-6]. SHAZAM has the flexibility for the user to program these tests with SHAZAM commands. Additional references that can be consulted are:
G. E. P. Box and D. R. Cox, "An Analysis of Transformations", Journal of the Royal Statistical Society, Series B, Vol. 26, 1964,

54

pp. 211-243. R. Davidson and J.G. MacKinnon, "Testing Linear and Log-linear Regressions against Box-Cox Alternatives", Canadian Journal of Economics, 1985, pp. 499-517. L.G. Godfrey and M.R. Wickens, "Testing Linear and Log-linear Regressions for Functional Form", Review of Economic Studies, 1981, pp. 487 -496. J.G. MacKinnon, H. White and R. Davidson, "Tests for Model Specification in the Presence of Alternative Hypotheses: Some Further Results", Journal of Econometrics, Vol. 21, 1983, pp. 53-70.

Confidence Intervals

The CONFID command computes confidence intervals using the estimated coefficients and standard errors from the previous estimation. With OLS estimation, the general format of commands is:
OLS depvar indeps CONFID indeps / options

where indeps is a list of the explanatory variables and options is a list of desired options. On the CONFID command, the variable names actually represent the coefficients for which interval estimates are required. The variable names listed on the CONFID command must be entered as explanatory variables on the estimation command. The variable name CONSTANT can also be specified to obtain an interval estimate for the intercept coefficient. A useful option is:
TCRIT= Specifies the t-distribution critical values for calculating confidence intervals. If this option is

not specified then SHAZAM computes 90% and 95% confidence intervals. Suppose the regression equation has N observations and K coefficients. For a parameter estimate b with estimated standard error se(b) the 100(1- )% confidence interval estimate is: [b - t se(b) , b + t se(b)] where t is the /2 critical value from a t-distribution with N-K degrees of freedom. SHAZAM computes 90% and 95% confidence intervals using the critical values of the t-distribution that are tabulated in econometrics textbooks. Alternatively, the user can specify critical values with the TCRIT= option on the CONFID command.

Example This example uses the Theil textile data set. The textile demand equation is specified in log-log form. The commands ( filename: CONFID.SHA) below obtain 90% and 95% interval estimates for the coefficients.
SAMPLE 1 17

55

READ (THEIL.txt) YEAR CONSUME INCOME PRICE * Transform the data to logarithms GENR LC=LOG(CONSUME) GENR LY=LOG(INCOME) GENR LP=LOG(PRICE) * Estimate the log-log model OLS LC LY LP / LOGLOG CONFID LY LP CONSTANT STOP

The CONFID command specifies the variable name CONSTANT. SHAZAM will then compute an interval estimate for the intercept coefficient as well as for the slope coefficients. The SHAZAM output can be viewed. The estimated coefficient on the variable LP has an interpretation as a price elasticity. The estimation results show that the point estimate for the price elasticity is: -.83. (For presentation purposes the numerical results are rounded to 2 decimal places). The 90% interval estimate is:
[-.89, -.77]

The 95% interval estimate for the price elasticity is:


[-.91, -.75]

The next example computes 99% interval estimates. The DISTRIB command is used to obtain an appropriate critical value for the interval estimates. The general command format for obtaining critical values of the t-distribution is:
DISTRIB prob / INVERSE TYPE=T DF=df CRITICAL=crit

where prob is a variable that contains tail area probabilities and df is the degrees of freedom. The CRITICAL= option can be used to save the critical values in the variable specified. This is shown in the commands (filename: CONFID1.SHA) below.
SAMPLE 1 17 READ (THEIL.txt) YEAR CONSUME INCOME PRICE * Transform the data to logarithms GENR LC=LOG(CONSUME) GENR LY=LOG(INCOME) GENR LP=LOG(PRICE) * Get the critical value to use for interval estimates SAMPLE 1 1 GEN1 ALPHA=.01 GEN1 A2=ALPHA/2 DISTRIB A2 / INVERSE TYPE=T DF=14 CRITICAL=Z * Compute point estimates and interval estimates SAMPLE 1 17 OLS LC LY LP / LOGLOG CONFID LY LP CONSTANT / TCRIT=Z STOP

56

The SHAZAM output can be viewed. The DISTRIB command uses a numerical algorithm to compute critical values. The critical value is computed as 2.9774. When rounded to 3 decimal places, this is identical to the value given in statistical tables in textbooks. The 99% interval estimate for the price elasticity is:
[-.94, -.72]

[SHAZAM Guide home] SHAZAM output - 90% and 95% interval estimates
|_SAMPLE 1 17 |_READ (THEIL.txt) YEAR CONSUME INCOME PRICE UNIT 88 IS NOW ASSIGNED TO: THEIL.txt 4 VARIABLES AND 17 OBSERVATIONS STARTING AT OBS |_* Transform the data to logarithms |_GENR LC=LOG(CONSUME) |_GENR LY=LOG(INCOME) |_GENR LP=LOG(PRICE) |_* Estimate the log-log model |_OLS LC LY LP / LOGLOG OLS ESTIMATION 17 OBSERVATIONS DEPENDENT VARIABLE = LC ...NOTE..SAMPLE RANGE SET TO: 1, 17 R-SQUARE = .9744 R-SQUARE ADJUSTED = .9707 VARIANCE OF THE ESTIMATE-SIGMA**2 = .97236E-03 STANDARD ERROR OF THE ESTIMATE-SIGMA = .31183E-01 SUM OF SQUARED ERRORS-SSE= .13613E-01 MEAN OF DEPENDENT VARIABLE = 4.8864 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -46.5862 VARIABLE NAME LY LP CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 14 DF 1.1432 .1560 7.328 -.82884 .3611E-01 -22.95 3.1636 .7048 4.489 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .891 .3216 1.1432 .000 -.987 -1.0074 -.8288 .001 .768 .0000 3.1636 1

|_CONFID LY LP CONSTANT USING 95% AND 90% CONFIDENCE INTERVALS CONFIDENCE INTERVALS BASED ON T-DISTRIBUTION WITH 14 D.F. - T CRITICAL VALUES = 2.145 AND 1.761 NAME LOWER 2.5% LOWER 5% COEFFICIENT UPPER 5% UPPER 2.5% LY .8085 .8684 1.1432 1.418 1.478 LP -.9063 -.8924 -.82884 -.7652 -.7514 CONSTANT 1.652 1.922 3.1636 4.405 4.675 |_STOP

STD. ERROR .156 .036 .705

[Back to Top]

[SHAZAM Guide home]


57

SHAZAM output - 99% interval estimates for regression coefficients


|_SAMPLE 1 17 |_READ (THEIL.txt) YEAR CONSUME INCOME PRICE UNIT 88 IS NOW ASSIGNED TO: THEIL.txt 4 VARIABLES AND 17 OBSERVATIONS STARTING AT OBS |_* Transform the data to logarithms |_GENR LC=LOG(CONSUME) |_GENR LY=LOG(INCOME) |_GENR LP=LOG(PRICE) |_* Get the critical value to use for interval estimates |_SAMPLE 1 1 |_GEN1 ALPHA=.01 |_GEN1 A2=ALPHA/2 |_DISTRIB A2 / INVERSE TYPE=T DF=14 CRITICAL=Z T DISTRIBUTION DF= 14.000 VARIANCE= 1.1667 H= 1.0000 A2 ROW PROBABILITY CRITICAL VALUE 1 .50000E-02 2.9774 PDF 1

.98931E-02

|_* Compute point estimates and interval estimates |_SAMPLE 1 17 |_OLS LC LY LP / LOGLOG OLS ESTIMATION 17 OBSERVATIONS DEPENDENT VARIABLE = LC ...NOTE..SAMPLE RANGE SET TO: 1, 17 R-SQUARE = .9744 R-SQUARE ADJUSTED = .9707 VARIANCE OF THE ESTIMATE-SIGMA**2 = .97236E-03 STANDARD ERROR OF THE ESTIMATE-SIGMA = .31183E-01 SUM OF SQUARED ERRORS-SSE= .13613E-01 MEAN OF DEPENDENT VARIABLE = 4.8864 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -46.5862 VARIABLE NAME LY LP CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 14 DF 1.1432 .1560 7.328 -.82884 .3611E-01 -22.95 3.1636 .7048 4.489 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .891 .3216 1.1432 .000 -.987 -1.0074 -.8288 .001 .768 .0000 3.1636

|_CONFID LY LP CONSTANT / TCRIT=Z CONFIDENCE INTERVALS BASED ON T-DISTRIBUTION WITH 14 D.F. - T CRITICAL VALUE = 2.977 NAME LOWER COEFFICIENT UPPER STD. ERROR LY .67868 1.1432 1.6076 .15600 LP -.93636 -.82884 -.72132 .36111E-01 CONSTANT 1.0651 3.1636 5.2620 .70480 |_STOP

Hypothesis Testing
58

The standard OLS estimation output from SHAZAM reports a t-ratio for testing the null hypothesis that the true regression coefficient is zero. When the regression equation contains more than 1 explanatory variable it may be of interest to test the null hypothesis that all slope coefficients are jointly equal to zero. This is called a test of the overall significance of the regression line. The F-test statistic for this test is computed with the ANOVA option on the OLS command. In practice, the economist is likely to be interested in other types of hypotheses that may involve linear (or nonlinear) combinations of the regression coefficients. Testing a single linear combination of coefficients Test statistics are computed with the TEST command that immediately follows the estimation command. With OLS estimation, the general format of commands for testing a single hypothesis is:
OLS depvar indeps / options TEST equation

The equation is specified as a function of the variables in the indeps list on the OLS command. Note: The variable names actually represent the coefficients involved in the hypothesis test. If a hypothesis test involving the intercept coefficient is required then the name CONSTANT can be used to represent the intercept. The SHAZAM output reports a t-test statistic and a p-value for a 2-sided test. The null hypothesis can be rejected if the p-value is less than a selected level of significance (say, 0.05). One-tailed tests can also be considered. For example, consider testing hypotheses about some unknown parameter . Suppose the null and alternative hypotheses are: H0: < c and H1: > c

where c is some scalar constant. The test statistic for the one-tailed test is computed in the same way as for a two-tailed test. However, the null hypothesis will be rejected only if the value of the test statistic is excessively large (giving support to the alternative hypothesis). Suppose that p is the p-value reported for the two-tailed test. The p-value for the inequality hypotheses stated above can be computed as follows: If the test statistic is positive the p-value is p/2. If the test statistic is negative the p-value is 1-p/2. Testing more than one linear combination of coefficients A test statistic for a joint test that involves two or more functions of the coefficients can be obtained in SHAZAM with the general command format:
OLS depvar indeps / options TEST

59

TEST equation1 TEST equation2 . . . END

The tests involved in the hypothesis are enclosed between a header that is a blank TEST command and an END command. Typically, an assumption in hypothesis testing is that the residuals are normally distributed. This assumption is then used to determine the distribution of the test statistic.

Example This example uses the Theil textile data set to illustrate hypothesis testing in SHAZAM. The textile demand equation is specified in log-log form so that the parameter estimates have interpretations as income elasticities and price elasticities. A number of hypotheses about consumer behaviour can be tested. For example, a negative price elasticity is expected. A price elasticity that is less than 1 in absolute value implies that demand is price inelastic. The command file (filename: TEST.SHA) below transforms the data to logarithms and estimates the demand equation by OLS. A series of hypothesis tests are then considered.
SAMPLE 1 17 READ (THEIL.txt) YEAR CONSUME INCOME PRICE * Transform the data to logarithms GENR LC=LOG(CONSUME) GENR LY=LOG(INCOME) GENR LP=LOG(PRICE) * Estimate the log-log model OLS LC LY LP / LOGLOG ANOVA * Hypothesis testing TEST LY=1 TEST LP=-1 * * A joint test TEST TEST LY=1 TEST LP=-1 END * * Now duplicate the F-test that is reported with the ANOVA option TEST TEST LY=0 TEST LP=0 END STOP

Note that the indentation used for the TEST commands is optional and is intended to improve the readability of the command file. Tab marks should not be used for indentation - the space bar should be used for this. The SHAZAM output can be viewed. The ANOVA option on the OLS command produces the following output.
60

REGRESSION ERROR TOTAL

ANALYSIS OF VARIANCE - FROM MEAN SS DF MS .51733 2. .25867 .13613E-01 14. .97236E-03 .53094 16. .33184E-01

F 266.018 P-VALUE .000

A test of the null hypothesis that all slope coefficients are zero reports a F-test statistic of 266. The pvalue is reported as .000 (this actually means less than .0005) and so there is strong evidence to reject the null hypothesis and conclude that the estimated relationship is a significant one. Note that a critical value for the test is obtained from a F-distribution with (2,14) degrees of freedom. Possibly more interesting tests about consumer behaviour are given with the TEST commands that follow the model estimation. The model estimation reports the following:
VARIABLE NAME LY LP CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 14 DF 1.1432 .1560 7.328 -.82884 .3611E-01 -22.95 3.1636 .7048 4.489 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .891 .3216 1.1432 .000 -.987 -1.0074 -.8288 .001 .768 .0000 3.1636

The income elasticity is the estimated coefficient on LY and this is reported as 1.1432. The next output shows the computation of a test statistic for the null hypothesis that the income elasticity is equal to one.
|_TEST LY=1 TEST VALUE = T STATISTIC = .14316 STD. ERROR OF TEST VALUE .15600 .91766674 WITH 14 D.F. P-VALUE= .37433

The TEST VALUE reported in the above output is obtained as 1.1432 - 1 = .1432. (In discussing the output some rounding of results is introduced). Note that the standard error of this test value is identical to the standard error for the coefficient on LY that is listed on the OLS estimation output. The t-statistic is computed as .1432 / .15600 = 0.918 . For a test of the null hypothesis against the two-sided alternative that the income elasticity is not equal to 1 the computed p-value is .37. This suggests that there is no evidence to reject the null hypothesis. For a one-sided test of the null hypothesis that the income elasticity is less than or equal to 1 against the alternative that the income elasticity is greater than 1 the p-value is 0.37433/2 = 0.187. Again, the null hypothesis is not rejected. The next output shows the computation of a test statistic for the null hypothesis that the price elasticity is equal to -1.
|_TEST LP=-1 TEST VALUE = T STATISTIC = .17116 STD. ERROR OF TEST VALUE .36111E -01 4.7398530 WITH 14 D.F. P-VALUE= .00032

The price elasticity is -.82884 and the TEST VALUE on the above output is computed as -.82884 - (-1) = .17116. The t-statistic is computed by dividing the test value by the standard error. The associated p-value gives strong evidence to reject the null hypothesis. Individual tests on the income and price elasticities have been considered. Now consider a joint test of the null hypothesis that the income elasticity is 1 and the price elasticity is -1. The output below shows the computed F-statistic for this test.
|_TEST |_ TEST LY=1

61

|_ TEST LP=-1 |_END F STATISTIC =

13.275308

WITH

2 AND

14 D.F.

P -VALUE=

.00058

By consulting printed statistical tables, the 1% critical value from the F-distribution with (2,14) degrees of freedom is 6.51. The test statistic clearly exceeds this. So the null hypothesis is rejected. The p-value reported on the SHAZAM output gives this conclusion immediately.

[SHAZAM Guide home]

SHAZAM output
|_SAMPLE 1 17 |_READ (THEIL.txt) YEAR CONSUME INCOME PRICE UNIT 88 IS NOW ASSIGNED TO: THEIL.txt 4 VARIABLES AND 17 OBSERVATIONS STARTING AT OBS |_* Transform the data to logarithms |_GENR LC=LOG(CONSUME) |_GENR LY=LOG(INCOME) |_GENR LP=LOG(PRICE) |_* Estimate the log-log model |_OLS LC LY LP / LOGLOG ANOVA OLS ESTIMATION 17 OBSERVATIONS DEPENDENT VARIABLE = LC ...NOTE..SAMPLE RANGE SET TO: 1, 17 R-SQUARE = .9744 R-SQUARE ADJUSTED = .9707 VARIANCE OF THE ESTIMATE-SIGMA**2 = .97236E-03 STANDARD ERROR OF THE ESTIMATE-SIGMA = .31183E-01 SUM OF SQUARED ERRORS-SSE= .13613E-01 MEAN OF DEPENDENT VARIABLE = 4.8864 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -46.5862 MODEL SELECTION TESTS - SEE JUDGE ET AL. (1985,P.242) AKAIKE (1969) FINAL PREDICTION ERROR - FPE = .11440E-02 (FPE IS ALSO KNOWN AS AMEMIYA PREDICTION CRITERION - PC) AKAIKE (1973) INFORMATION CRITERION - LOG AIC = -6.7770 SCHWARZ (1978) CRITERION - LOG SC = -6.6300 MODEL SELECTION TESTS - SEE RAMANATHAN (1992,P.167) CRAVEN-WAHBA (1979) GENERALIZED CROSS VALIDATION - GCV = .11807E-02 HANNAN AND QUINN (1979) CRITERION = .11565E-02 RICE (1984) CRITERION = .12376E-02 SHIBATA (1981) CRITERION = .10834E-02 SCHWARZ (1978) CRITERION - SC = .13202E-02 AKAIKE (1974) INFORMATION CRITERION - AIC = .11397E-02 ANALYSIS OF VARIANCE - FROM MEAN SS DF MS .51733 2. .25867 .13613E-01 14. .97236E-03 F 266.018 P-VALUE 1

REGRESSION ERROR

62

TOTAL

.53094

16.

.33184E-01

.000 F 139325.591 P-VALUE .000

REGRESSION ERROR TOTAL VARIABLE NAME LY LP CONSTANT

ANALYSIS OF VARIANCE - FROM ZERO SS DF MS 406.42 3. 135.47 .13613E-01 14. .97236E-03 406.44 17. 23.908

ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 14 DF 1.1432 .1560 7.328 -.82884 .3611E-01 -22.95 3.1636 .7048 4.489

PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .891 .3216 1.1432 .000 -.987 -1.0074 -.8288 .001 .768 .0000 3.1636

|_* Hypothesis testing |_TEST LY=1 TEST VALUE = .14316 STD. ERROR OF TEST VALUE .15600 T STATISTIC = .91766674 WITH 14 D.F. P-VALUE= .37433 F STATISTIC = .84211225 WITH 1 AND 14 D.F. P -VALUE= .37433 WALD CHI-SQUARE STATISTIC = .84211225 WITH 1 D.F. P-VALUE= .35879 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 1.00000 |_TEST LP=-1 TEST VALUE = .17116 STD. ERROR OF TEST VALUE .36111E -01 T STATISTIC = 4.7398530 WITH 14 D.F. P-VALUE= .00032 F STATISTIC = 22.466206 WITH 1 AND 14 D.F. P -VALUE= .00032 WALD CHI-SQUARE STATISTIC = 22.466206 WITH 1 D.F. P-VALUE= .00000 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = .04451 |_* |_* A joint test |_TEST |_ TEST LY=1 |_ TEST LP=-1 |_END F STATISTIC = 13.275308 WITH 2 AND 14 D.F. P-VALUE= .00058 WALD CHI-SQUARE STATISTIC = 26.550616 WITH 2 D.F. P-VALUE= .00000 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = .07533 |_* |_* Now duplicate the F-test that is reported with the ANOVA option |_TEST |_ TEST LY=0 |_ TEST LP=0 |_END F STATISTIC = 266.01794 WITH 2 AND 14 D.F. P -VALUE= .00000 WALD CHI-SQUARE STATISTIC = 532.03587 WITH 2 D.F. P-VALUE= .00000 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = .00376 |_STOP

Estimation with Restrictions


Linear parameter restrictions can be imposed in model estimation. There are 2 ways of proceeding. One way is to substitute the restrictions into the equation to obtain a reparametrized equation. OLS applied to the new equation will yield restricted estimates.

63

The second way is to obtain restricted least squares estimates as the solution to a constrained least squares minimization problem. When this approach is followed, the general command format for restricted least squares is:
OLS depvar indeps / RESTRICT options RESTRICT equation1 RESTRICT equation2 . . . END

The RESTRICT option on the OLS command tells SHAZAM that RESTRICT commands follow with the specification of the linear restrictions. The END command is required to mark the end of the list of RESTRICT commands. The RESTRICT commands are specified as a linear function of the variables specified in the indeps list on the OLS command. The variable names actually represent the coefficients. Note that each restriction will add one degree of freedom. Example Parameter restrictions may be suggested by economic theory. For example, constant returns to scale may impose parameter restrictions in a production function. Another application of restricted estimation, that is developed in this example, is to obtain more precise estimates in the presence of multicollinearity. This example, from Griffiths, Hill and Judge, uses the Klein-Goldberger data set. The study considers the relationship between aggregate consumption (C), and 3 components of income : wage income (W), nonwage-nonfarm income (P) and farm income (A) for the U.S. economy. It can be expected that components of income move together - so multicollinearity may be a problem. The regression equation is: Ct =
0

Wt +

Pt +

At + et

The SHAZAM commands (filename: KLEING.SHA) below use the STAT command to get the sample correlation matrix for the income variables. The equation is then estimated by OLS. Prior expertise suggests that reasonable parameter restrictions are:
2

= 0.75

and

= 0.625

The equation is estimated by restricted least squares and tests of the validity of the parameter restrictions are considered.
SAMPLE 1 20 READ (KLEING.txt) C W P A STAT W P A / PCOR * Unrestricted estimation OLS C W P A * Restricted estimation OLS C W P A / RESTRICT RESTRICT P=0.75*W

64

RESTRICT A=0.625*W END * -------------------------------------------------------------------* An equivalent way of obtaining the restricted least squares estimates * is to make a substitution for the restrictions as follows. GENR X = W + 0.75*P + 0.625*A OLS C X * Get the estimated coefficent on P TEST 0.75*X * Get the estimated coefficent on A TEST 0.625*X * -------------------------------------------------------------------* * Test to determine if the restrictions are accepted or rejected * Individual Restriction Test : t-test OLS C W P A TEST P=0.75*W TEST A=0.625*W * Joint Test of the Restrictions : F-test TEST TEST P=0.75*W TEST A=0.625*W END STOP

The SHAZAM output can be inspected. First look at the correlation matrix of the explanatory variables. This is reported as:
CORRELATION MATRIX OF VARIABLES W 1.0000 P .71847 1.0000 A .91517 .63061 W P 20 OBSERVATIONS 1.0000 A

The correlation coefficient of 0.915 indicates a strong linear association between wage income (W) and farm income (A) - a sign of a multicollinearity problem. The unrestricted estimation reports the results:
R-SQUARE = VARIABLE NAME W P A CONSTANT .9527 R-SQUARE ADJUSTED = T-RATIO 16 DF 6.100 .6897 .1114 .9116 .9438

ESTIMATED STANDARD COEFFICIENT ERROR 1.0588 .1736 .45224 .6558 .12115 1.087 8.1328 8.921

PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .836 .9226 .7683 .500 .170 .0542 .1106 .913 .028 .0151 .0088 .375 .222 .0000 .1122

The parameter estimate on W is 1.06 (somewhat large) - this implies that a $1 increase in wage income leads to more than a $1 increase in consumption expenditure. The effects of P and A do not appear to be individually significant - although a high R-square is reported. The restricted estimation reports the results:
R-SQUARE = VARIABLE NAME W P .9517 R-SQUARE ADJUSTED = .9490

ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 18 DF .96639 .5133E-01 18.83 .72479 .3850E-01 18.83

PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .976 .8421 .7013 .000 .976 .0868 .1773

65

A CONSTANT

.60399 5.6048

.3208E-01 3.680

18.83 1.523

.000 .145

.976 .338

.0753 .0000

.0441 .0773

The effect of the restrictions is to lower the standard errors of each of the estimated coefficients.

[SHAZAM Guide home] SHAZAM output comparing unrestricted and restricted estimation Discussion on these results is available in Griffiths, Hill and Judge [1993, Chapter 13] or Judge, Hill, Griffiths, Lutkepohl and Lee [1988, Chapter 21].
|_SAMPLE 1 20 |_READ (KLEING.txt) C W P A UNIT 88 IS NOW ASSIGNED TO: KLEING.txt 4 VARIABLES AND 20 OBSERVATIONS STARTING AT OBS |_STAT W P A / PCOR NAME N MEAN W 20 52.584 P 20 17.725 A 20 5.2935 ST. DEV 16.641 2.2876 2.3815 VARIANCE 276.93 5.2333 5.6715 MINIMUM 33.590 13.390 1.6700 1 MAXIMUM 80.970 22.120 9.3000

CORRELATION MATRIX OF VARIABLES W 1.0000 P .71847 1.0000 A .91517 .63061 W P

20 OBSERVATIONS 1.0000 A

|_* Unrestricted estimation |_OLS C W P A OLS ESTIMATION 20 OBSERVATIONS DEPENDENT VARIABLE = C ...NOTE..SAMPLE RANGE SET TO: 1, 20 R-SQUARE = .9527 R-SQUARE ADJUSTED = .9438 VARIANCE OF THE ESTIMATE-SIGMA**2 = 20.496 STANDARD ERROR OF THE ESTIMATE-SIGMA = 4.5272 SUM OF SQUARED ERRORS-SSE= 327.93 MEAN OF DEPENDENT VARIABLE = 72.465 LOG OF THE LIKELIHOOD FUNCTION = -56.3495 VARIABLE NAME W P A CONSTANT ESTIMATED STANDARD COEFFICIENT ERROR 1.0588 .1736 .45224 .6558 .12115 1.087 8.1328 8.921 T-RATIO 16 DF 6.100 .6897 .1114 .9116 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .836 .9226 .7683 .500 .170 .0542 .1106 .913 .028 .0151 .0088 .375 .222 .0000 .1122

|_* Restricted estimation |_OLS C W P A / RESTRICT OLS ESTIMATION 20 OBSERVATIONS DEPENDENT VARIABLE = C ...NOTE..SAMPLE RANGE SET TO: 1, 20 |_RESTRICT P=0.75*W |_RESTRICT A=0.625*W

66

|_END F TEST ON RESTRICTIONS=

.16949

WITH

2 AND

16 DF

P -VALUE=

.84559

R-SQUARE = .9517 R-SQUARE ADJUSTED = .9490 VARIANCE OF THE ESTIMATE-SIGMA**2 = 18.604 STANDARD ERROR OF THE ESTIMATE-SIGMA = 4.3133 SUM OF SQUARED ERRORS-SSE= 334.88 MEAN OF DEPENDENT VARIABLE = 72.465 LOG OF THE LIKELIHOOD FUNCTION = -56.5591 VARIABLE NAME W P A CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 18 DF .96639 .5133E-01 18.83 .72479 .3850E-01 18.83 .60399 .3208E-01 18.83 5.6048 3.680 1.523 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .976 .8421 .7013 .000 .976 .0868 .1773 .000 .976 .0753 .0441 .145 .338 .0000 .0773

|_* -------------------------------------------------------------------|_* An equivalent way of obtaining the restricted least squares estimates |_* is to make a substitution for the restrictions as follows. |_GENR X = W + 0.75*P + 0.625*A |_OLS C X OLS ESTIMATION 20 OBSERVATIONS DEPENDENT VARIABLE = C ...NOTE..SAMPLE RANGE SET TO: 1, 20 R-SQUARE = .9517 R-SQUARE ADJUSTED = .9490 VARIANCE OF THE ESTIMATE-SIGMA**2 = 18.604 STANDARD ERROR OF THE ESTIMATE-SIGMA = 4.3133 SUM OF SQUARED ERRORS-SSE= 334.88 MEAN OF DEPENDENT VARIABLE = 72.465 LOG OF THE LIKELIHOOD FUNCTION = -56.5591 VARIABLE NAME X CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 18 DF .96639 .5133E-01 18.83 5.6048 3.680 1.523 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .976 .9755 .9227 .145 . 338 .0000 .0773

|_* Get the estimated coefficent on P |_TEST 0.75*X TEST VALUE = .72479 STD. ERROR OF TEST VALUE .38496E -01 T STATISTIC = 18.827780 WITH 18 D.F. P-VALUE= .00000 F STATISTIC = 354.48529 WITH 1 AND 18 D.F. P-VALUE= .00000 WALD CHI-SQUARE STATISTIC = 354.48529 WITH 1 D.F. P-VALUE= .00000 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = .00282 |_* Get the estimated coefficent on A |_TEST 0.625*X TEST VALUE = .60399 STD. ERROR OF TEST VALUE .32080E -01 T STATISTIC = 18.827780 WITH 18 D.F. P-VALUE= .00000 F STATISTIC = 354.48529 WITH 1 AND 18 D.F. P -VALUE= .00000 WALD CHI-SQUARE STATISTIC = 354.48529 WITH 1 D.F. P-VALUE= .00000 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = .00282 |_* -------------------------------------------------------------------|_* |_* Test to determine if the restrictions are accepted or rejected |_* Individual Restriction Test : t-test |_OLS C W P A OLS ESTIMATION 20 OBSERVATIONS DEPENDENT VARIABLE = C ...NOTE..SAMPLE RANGE SET TO: 1, 20 R-SQUARE = .9527 R-SQUARE ADJUSTED = .9438

67

VARIANCE OF THE ESTIMATE-SIGMA**2 = 20.496 STANDARD ERROR OF THE ESTIMATE-SIGMA = 4.5272 SUM OF SQUARED ERRORS-SSE= 327.93 MEAN OF DEPENDENT VARIABLE = 72.465 LOG OF THE LIKELIHOOD FUNCTION = -56.3495 VARIABLE NAME W P A CONSTANT ESTIMATED STANDARD COEFFICIENT ERROR 1.0588 .1736 .45224 .6558 .12115 1.087 8.1328 8.921 T-RATIO 16 DF 6.100 .6897 .1114 .9116 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .836 .9226 .7683 .500 .170 .0542 .1106 .913 .028 .0151 .0088 .375 .222 .0000 .1122

|_TEST P=0.75*W TEST VALUE = -.34184 STD. ERROR OF TEST VALUE .72396 T STATISTIC = -.47218600 WITH 16 D.F. P-VALUE= .64317 F STATISTIC = .22295962 WITH 1 AND 16 D.F. P -VALUE= .64317 WALD CHI-SQUARE STATISTIC = .22295962 WITH 1 D.F. P-VALUE= .63679 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 1.00000 |_TEST A=0.625*W TEST VALUE = -.54059 STD. ERROR OF TEST VALUE 1.1812 T STATISTIC = -.45764552 WITH 16 D.F. P-VALUE= .65336 F STATISTIC = .20943942 WITH 1 AND 16 D.F. P -VALUE= .65336 WALD CHI-SQUARE STATISTIC = .20943942 WITH 1 D.F. P-VALUE= .64721 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 1.00000 |_* Joint Test of the Restrictions : F-test |_TEST |_TEST P=0.75*W |_TEST A=0.625*W |_END F STATISTIC = .16949471 WITH 2 AND 16 D.F. P -VALUE= .84559 WALD CHI-SQUARE STATISTIC = .33898942 WITH 2 D.F. P-VALUE= .84409 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 1.00000 |_STOP

Prediction
Following an estimation command, the FC command will generate predictions and forecast standard errors. For prediction with the linear regression model, the general command format is:
OLS depvar indeps / options FC / options

Some useful options with the FC command are:


LIST BEG= END= FCSE=

Prints the predictions, the forecast standard errors and forecast diagnostics. Specifies the start and end observation numbers for the predictions.

Saves the forecast standard errors in the variable specified. The variable must be defined before the model estimation. PREDICT= Saves the predictions in the variable specified. The variable must be defined before the model estimation.
68

More options and features of the FC command are described in the SHAZAM User's Reference Manual. Example This example analyzes voting patterns in the state of Florida for the presidential election held on November 7, 2000. The 2000 presidential race emerged as a close contest between Al Gore and George W. Bush. On election day, the results revealed that the Florida outcome would determine the next President of the United States. However, the Florida election results showed a difference of only a few hundred votes between Gore and Bush. A final decision was delayed until various recounts and counts of absentee ballots could be completed. An additional controversy was that the "butterfly" ballot design in the county of Palm Beach may have confused voters. There was speculation that Palm Beach voters that intended to vote for Gore may have mistakenly given their vote to Buchanan. Adams and Fastnow present a statistical analysis for detecting the possibility of voting irregularities in Palm Beach. The reference is: Greg D. Adams and Chris Fastnow, "A Note on the Voting Irregularities in Palm Beach, Florida" (downloaded from the internet). A data set contains the Florida county-level returns for the 2000 presidential election. It is proposed that an estimate of the number of votes for Buchanan in Palm Beach county can be predicted from a linear regression equation that relates Buchanan's votes to Bush's votes in the other Florida counties. Adams and Fastnow give the following reasoning. "There are theoretical reasons to think that the number of Buchanan's votes should correlate with Bush's. First, for any candidate, a large county with many people will generally provide the candidate more votes than a county with fewer people, all else being equal. Second, holding size of the county constant, a more conservative county should favor both Buchanan and Bush in a proportionate way. It thus seemed reasonable to us to expect a systematic relationship between the two candidates' votes." The SHAZAM commands below estimate the relationship between votes for Buchanan and Bush in the counties of Florida excluding Palm Beach. The FC command is then used to predict the number of votes for Buchanan in Palm Beach county. From the results, a 99% prediction interval is calculated.
SAMPLE 1 67 READ (PRES2000.txt) GORE BUSH BUCHANAN NADER OTHER / SKIPLINE=1 DIM YHAT 67 SE 67 * Estimate the relationship between votes for Buchanan and Bush * in the counties of Florida excluding Palm Beach. SAMPLE 1 66 OLS BUCHANAN BUSH * Predict the number of votes for Buchanan in Palm Beach. FC / LIST BEG=67 END=67 PREDICT=YHAT FCSE=SE * Calculate a 99% prediction interval * Obtain the critical value. GEN1 DF=$N-$K SAMPLE 1 1 GEN1 ALPHA=0.01/2 DISTRIB ALPHA / TYPE=T DF=DF INVERSE CRITICAL=TC SAMPLE 67 67 GENR YUP=YHAT+TC*SE GENR YLOW=YHAT-TC*SE * Print the prediction interval PRINT YLOW YUP

69

* Scatterplot SAMPLE 1 67 GRAPH BUCHANAN BUSH / NOKEY STOP

The SHAZAM output can be viewed. The results show that, assuming that Palm Beach voting patterns are similar to the other Florida counties, the predicted number of votes for Buchanan in Palm Beach county is 601. The 99% prediction interval is:
[289 , 914]

In Palm Beach county, the actual number of votes for Buchanan of 3407 exceeded the upper limit of the prediction interval by more than 2400 votes. A scatterplot, that shows the outlier Buchanan result in Palm Beach county, is displayed below.

Model Critique The scatterplot shown above highlights the variation in population size for the 67 Florida counties. Summary statistics for the total number of votes by county are given below. Mean First Quartile Median Third Quartile 88,912 8,021 35,149 103,110

Maximum 625,362 (Miami-Dade County)


70

Palm Beach County

432,286

The SHAZAM commands for calculating the summary statistics are available. A few counties with relatively large population size (including Palm Beach county) are pulling up the mean to a value that exceeds the median. It may be reasonable to expect that large counties will have higher variability in the Buchanan vote than counties in the lower quartile with fewer than 8,000 total votes. For the simple linear regression model estimated above, this will be revealed in heteroskedastic errors. In the presence of heteroskedasticity, the confidence intervals calculated from the least squares estimation results will be incorrect. Therefore, tests for heteroskedasticity should be inspected. An alternative modelling approach is to use log-transformed data. The log transformation rescales the data and therefore may correct for heteroskedasticity that is observed in the linear model. In particular, the observations in the upper quartile are compressed so that the difference with the other observations is less extreme. The results for tests for heteroskedasticity and prediction with log-transformed variables are available. Concluding Remarks Adams and Fastnow tried a number of other model variations and concluded: "If one holds to the statistical assumptions of most of these models, and if Buchanan's unusual performance can be attributed to voters who intended to vote for Gore (an assumption that some have contested), then it can be claimed with a fairly high degree of statistical confidence that the mistakes cost Gore a significant share of votes." Note: This example is provided for teaching purposes only to illustrate econometric methodology that can be implemented with the SHAZAM software. The example is not intended to make any political comment.

SHAZAM output
|_SAMPLE 1 67 |_READ (PRES2000.txt) GORE BUSH BUCHANAN NADER OTHER / SKIPLINE=1 UNIT 88 IS NOW ASSIGNED TO: PRES2000.txt 5 VARIABLES AND 67 OBSERVATIONS STARTING AT OBS 1 |_DIM YHAT 67 SE 67 |_* Estimate the relationship between votes for Buchanan and Bush |_* in the counties of Florida excluding Palm Beach. |_SAMPLE 1 66 |_OLS BUCHANAN BUSH OLS ESTIMATION 66 OBSERVATIONS DEPENDENT VARIABLE= BUCHANAN ...NOTE..SAMPLE RANGE SET TO: 1, 66

71

R-SQUARE = 0.7511 R-SQUARE ADJUSTED = 0.7472 VARIANCE OF THE ESTIMATE-SIGMA**2 = 12880. STANDARD ERROR OF THE ESTIMATE-SIGMA = 113.49 SUM OF SQUARED ERRORS-SSE= 0.82430E+06 MEAN OF DEPENDENT VARIABLE = 213.00 LOG OF THE LIKELIHOOD FUNCTION = -404.927 VARIABLE NAME BUSH CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 64 DF 0.34962E-02 0.2516E-03 13.90 66.940 17.48 3.829 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS 0.000 0.867 0.8666 0.6857 0.000 0.432 0.0000 0.3143

|_* Predict the number of votes for Buchanan in Palm Beach. |_FC / LIST BEG=67 END=67 PREDICT=YHAT FCSE=SE DEPENDENT VARIABLE = BUCHANAN 1 OBSERVATIONS REGRESSION COEFFICIENTS 0.349623785167E-02 66.9403199359 OBS. OBSERVED PREDICTED CALCULATED STD. ERROR NO. VALUE VALUE RESIDUAL 67 3407.0 601.33 2805.7 117.711 SUM OF ABSOLUTE ERRORS= 2805.7 R-SQUARE BETWEEN OBSERVED AND PREDICTED = 0.0000 MEAN ERROR = 2805.7 SUM-SQUARED ERRORS = 0.78718E+07 MEAN SQUARE ERROR = 0.78718E+07 MEAN ABSOLUTE ERROR= 2805.7 ROOT MEAN SQUARE ERROR = 2805.7 MEAN SQUARED PERCENTAGE ERROR= 6781.6 THEIL INEQUALITY COEFFICIENT U = 0.000 DECOMPOSITION PROPORTION DUE TO BIAS = 1.0000 PROPORTION DUE TO VARIANCE = 0.0000 PROPORTION DUE TO COVARIANCE = 0.0000 DECOMPOSITION PROPORTION DUE TO BIAS = 1.0000 PROPORTION DUE TO REGRESSION = 0.0000 PROPORTION DUE TO DISTURBANCE = 0.0000 |_* Calculate a 99% prediction interval |_* Obtain the critical value. |_GEN1 DF=$N-$K ..NOTE..CURRENT VALUE OF $N = 66.000 ..NOTE..CURRENT VALUE OF $K = 2.0000 |_SAMPLE 1 1 |_GEN1 ALPHA=0.01/2 |_DISTRIB ALPHA / TYPE=T DF=DF INVERSE CRITICAL=TC T DISTRIBUTION DF= 64.000 VARIANCE= 1.0323 H= 1.0000 ALPHA ROW PROBABILITY CRITICAL VALUE 1 0.50000E-02 2.6553 PDF

0.13308E-01

|_SAMPLE 67 67 |_GENR YUP=YHAT+TC*SE |_GENR YLOW=YHAT-TC*SE |_* Print the prediction interval |_PRINT YLOW YUP YLOW YUP 288.7689 913.8837 |_* Scatterplot

72

|_SAMPLE 1 67 |_GRAPH BUCHANAN BUSH / NOKEY 67 OBSERVATIONS SHAZAM WILL NOW MAKE A PLOT FOR YOU |_STOP

SHAZAM commands The SHAZAM commands below calculate summary statistics for the total number of votes in Florida by county.
SAMPLE 1 67 READ (PRES2000.txt) GORE BUSH BUCHANAN NADER OTHER / SKIPLINE=1 * Calculate the total number of votes recorded in each county GENR TOTAL=GORE+BUSH+BUCHANAN+NADER+OTHER * Summary statistics SAMPLE 1 67 STAT TOTAL / PMEDIAN * Print the total number of votes for Palm Beach county SAMPLE 67 67 PRINT TOTAL STOP

The SHAZAM output follows.


|_SAMPLE 1 67 |_READ (PRES2000.txt) GORE BUSH BUCHANAN NADER OTHER / SKIPLINE=1 UNIT 88 IS NOW ASSIGNED TO: PRES2000.txt 5 VARIABLES AND 67 OBSERVATIONS STARTING AT OBS 1 |_* Calculate the total number of votes recorded in each county |_GENR TOTAL=GORE+BUSH+BUCHANAN+NADER+OTHER |_* Summary statistics |_SAMPLE 1 67 |_STAT TOTAL / PMEDIAN NAME N MEAN ST. DEV VARIANCE MINIMUM TOTAL 67 88912. 0.13180E+06 0.17370E+11 2410.0 VARIABLE = TOTAL MEDIAN = 35149. LOWER 25%= 8021.0 MODE NOT APPLICABLE

MAXIMUM 0.62536E+06

UPPER 25%=

0.10311E+06 INTERQUARTILE RANGE= 0.9509E+05

|_* Print the total number of votes for Palm Beach county |_SAMPLE 67 67 |_PRINT TOTAL TOTAL 432286.0 |_STOP

Special Topics Working with lagged variables Using trend variables Dummy Variables - Modelling structural change, seasonality and more

73

Working with lagged variables


Regression equations that use time series data often contain lagged variables. For example, consider the regression equation: Yt =
0

Yt-1 +

Xt + et

for t = 2,...,T

where et is a random error term and the total number of observations in the data set is T. This equation contains a lagged dependent variable as an explanatory variable. This is called an autoregressive model or a dynamic model. Note that the sample period is adjusted to start at observation 2. This is because the first observation is "lost" when a lagged variable is required. So the estimation now uses T-1 observations. Another example of a model with lagged variables is: Yt =
0

Xt +

Xt-1 +

Xt-2 +

Xt-3 + ut

for t = 4,...,T

This model includes current and lagged values of the explanatory variables as regressors. This is called a distributed-lag model. In SHAZAM lagged variables are created by using the GENR command with the LAG function. For a 1period lag, the command format is:
GENR newvar=LAG(var)

In general, for an n-period lag, the command format is:


GENR newvar=LAG(var,n)

where n is the number of lags required. Some important rules must be followed when the LAG function is used. 1. When lags are taken SHAZAM typically sets the initial undefined observations to 0. Therefore, the SAMPLE command must be adjusted to ensure that the subsequent analysis will not include the 0 observations. 2. The time series data must be ordered with the earliest observation as the first observation and the most recent observation as the final observation in the data set. 3. If the left-hand side variable has the same name as the variable in the LAG function then a recursive calculation is implemented. For example, suppose capital stock is to be computed as:
4. CAPITAL(t) = CAPITAL(t-1) + INVEST(t)

and the initial capital stock is 25.3. The capital stock series can be computed with the SHAZAM commands:
GENR CAPITAL=25.3 SAMPLE 2 T

74

GENR CAPITAL=LAG(CAPITAL)+INVEST

Example - Regression with a Lagged Dependent Variable This example uses a data set on monthly sales and advertising expenditures of a dietary weight control product. It is expected that the impact of advertising expenditures (variable name ADVERT) on sales (variable name SALES) will be distributed over a number of months. A model that captures the lagged advertising effects is: SALESt = + SALESt-1 + ADVERTt + et for t = 2,...,T

The coefficients , , and can be estimated by the method of ordinary least squares. However, the presence of the lagged dependent variable means that the OLS estimation rule does not give a linear unbiased estimator. It follows that hypothesis testing will only be approximately valid. A result that can be established is that if the error process is serially uncorrelated then the lagged dependent variable will be uncorrelated with the current period error and the OLS estimator will be consistent (close to the true parameter value with high probability in large samples). By repeated substitution for SALESt-1 it is found that an increase of 1 unit in advertising in month t leads to an increase in sales of: in period t, in period t+1,
2 3

in period t+2, in period t+3, etc.

With | | < 1 this gives a pattern of exponentially declining impacts as time goes on. The total increase in sales over all current and future time periods is the sum: (1 + +
2

+...) =

/ (1 - )

This is the result for the sum of an infinite geometric series when | | < 1. After only k time periods the effect is: (1 + + . . . +
k

) =

(1 -

k+1

) / (1 - )

Thus at time k, the percentage of the total advertising effect realized is: 100 (1 k+1

)%

The above can be solved to find the time period k at which 100p percent of the impact on sales is expected. This gives: k = log(1 - p) / log( ) - 1 The SHAZAM commands (filename: SALES.SHA) for equation estimation and analysis of the results follow.
75

SAMPLE 1 36 READ (SALES.txt) SALES ADVERT GENR L1SALES=LAG(SALES) * List the data and take a look PRINT SALES L1SALES ADVERT * Adjust the sample period SAMPLE 2 36 OLS SALES L1SALES ADVERT / COEF=BETA * Analyze the effect of a 1 unit increase in advertising. GEN1 A=BETA:1 GEN1 B=BETA:2 * Get the total impact of advertising on all future sales. GEN1 TOTAL = B/(1-A) * Find the time period at which 95% of the impact is expected. GEN1 P95 = LOG(1-.95)/LOG(A) - 1 PRINT TOTAL P95 * Find the expected increases in sales for up to 6 months ahead. SAMPLE 1 7 GENR AHEAD=TIME(-1) GENR IMPACT = B*(A**AHEAD) PRINT AHEAD IMPACT STOP

The SHAZAM output can be viewed. The estimated equation is: SALESt = 7.45 + 0.528 SALESt-1 + 0.146 ADVERTt + t The results show that a 1 unit increase in advertising gives a 0.146 unit increase in sales in the current month. However, the total expected increase in sales in the current and all future months is calculated as:
0.146 / (1 - 0.528) = 0.310

The time period at which 95% of the effect is realized is found as:
log(1 - 0.95)/log(0.528) - 1 = 3.69

This implies that after 4 months more than 95% of the advertising effect will be reflected in the sales performance. The figure below shows the month by month sales response to advertising in the current month.

76

[SHAZAM Guide home] SHAZAM output - Regression with a Lagged Dependent Variable
|_SAMPLE 1 36 |_READ (SALES.txt) SALES ADVERT UNIT 88 IS NOW ASSIGNED TO: SALES.txt 2 VARIABLES AND 36 OBSERVATIONS STARTING AT OBS |_GENR L1SALES=LAG(SALES) ..NOTE.LAG VALUE IN UNDEFINED OBSERVATIONS SET TO ZERO |_* List the data and take a look |_PRINT SALES L1SALES ADVERT SALES L1SALES ADVERT 12.00000 .0000000 15.00000 20.50000 12.00000 16.00000 21.00000 20.50000 18.00000 15.50000 21.00000 27.00000 15.30000 15.50000 21.00000 23.50000 15.30000 49.00000 24.50000 23.50000 21.00000 21.30000 24.50000 22.00000 23.50000 21.30000 28.00000 28.00000 23.50000 36.00000 24.00000 28.00000 40.00000 15.50000 24.00000 3.000000 17.30000 15.50000 21.00000 25.30000 17.30000 29.00000 25.00000 25.30000 62.00000 36.50000 25.00000 65.00000 36.50000 36.50000 46.00000 29.60000 36.50000 44.00000 30.50000 29.60000 33.00000 28.00000 30.50000 62.00000 26.00000 28.00000 22.00000 21.50000 26.00000 12.00000 19.70000 21.50000 24.00000 19.00000 19.70000 3.000000 16.00000 19.00000 5.000000 20.70000 16.00000 14.00000 26.50000 20.70000 36.00000 30.60000 26.50000 40.00000 32.30000 30.60000 49.00000 29.50000 32.30000 7.000000 28.30000 29.50000 52.00000 31.30000 28.30000 65.00000 32.30000 31.30000 17.00000 26.40000 32.30000 5.000000 23.40000 26.40000 17.00000 16.40000 23.40000 1.000000 |_* Adjust the sample period |_SAMPLE 2 36 |_OLS SALES L1SALES ADVERT / COEF=BETA OLS ESTIMATION 35 OBSERVATIONS DEPENDENT VARIABLE = SALES 1

77

...NOTE..SAMPLE RANGE SET TO:

2,

36

R-SQUARE = .6720 R-SQUARE ADJUSTED = .6515 VARIANCE OF THE ESTIMATE-SIGMA**2 = 12.142 STANDARD ERROR OF THE ESTIMATE-SIGMA = 3.4845 SUM OF SQUARED ERRORS-SSE= 388.53 MEAN OF DEPENDENT VARIABLE = 24.606 LOG OF THE LIKELIHOOD FUNCTION = -91.7859 VARIABLE NAME L1SALES ADVERT CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 32 DF .52793 .1021 5.170 .14647 .3308E-01 4.428 7.4469 2.470 3.015 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .675 .5478 .5252 .000 .616 .4692 .1721 .005 .470 .0000 .3027

|_* Analyze the effect of a 1 unit increase in advertising. |_GEN1 A=BETA:1 |_GEN1 B=BETA:2 |_* Get the total impact of advertising on all future sales. |_GEN1 TOTAL = B/(1-A) |_* Find the time period at which 95% of the impact is expected. |_GEN1 P95 = LOG(1-.95)/LOG(A) - 1 |_PRINT TOTAL P95 TOTAL .3102750 P95 3.689629 |_* Find the expected increases in sales for up to 6 months ahead. |_SAMPLE 1 7 |_GENR AHEAD=TIME(-1) |_GENR IMPACT = B*(A**AHEAD) |_PRINT AHEAD IMPACT AHEAD IMPACT .0000000 .1464728 1.000000 .7732677E-01 2.000000 .4082280E-01 3.000000 .2155141E-01 4.000000 .1137755E-01 5.000000 .6006502E-02 6.000000 .3170988E-02 |_STOP

Using Trend Variables

Regression equations that use time series data may include a time index or trend variable. This trend variable can serve as a proxy for a variable that affects the dependent variable and is not directly observable -- but is highly correlated with time. For example, in the estimation of production functions a trend variable may be included as a proxy for technological change. For the estimation of consumption functions a trend variable may serve as a proxy for changes in consumer preferences. In SHAZAM a trend variable that takes the values 1, 2, ... , T can be created by using the TIME function on the GENR command. The command format is:
GENR newvar=TIME(0)

78

where newvar is the name to assign the trend variable. Consider variables Y and X with annual observations Yt and Xt for t = 1, 2, ..., T. A regression equation that includes a trend variable is: Yt =
0

Xt +

t + et
2

where et is a random error. The coefficient variables (that is, X) constant. An alternative model specification is: ln(Yt) =
0

measures the annual change in Y holding all other

ln(Xt) +

t + ut

where u t is a random error. When the dependent variable is in log form the coefficient on the trend variable has an interpretation as a growth rate. In this example, given the same level of X,
for for > 0, 100( 2) is the percentage rate of growth for Y; 2 < 0, 100( 2) is the percentage rate of decay for Y.
2

This is an instantaneous growth rate. The compound annual rate of growth for Y, holding all other variables constant, is: g = 100 (exp( 2) - 1) An estimate for g can be computed from the OLS estimation results. From statistical theory a general result is that if Z is a normally distributed random variable then: E[exp(Z)] = exp{E(Z) + Var(Z) /2} Suppose a2 is the OLS estimate of 2 and V(a2) is the estimate of the variance of a2. The above result can be applied to obtain an estimate of the growth rate as: 100 (exp{a2 - V(a2)/2} - 1) Note: The elasticities that are reported in the final column of the SHAZAM OLS estimation output must be interpreted with caution. The elasticities reported for time trend variables likely have no meaningful interpretation.

Example This example comes from the term paper research done by students in Economics 326 at the University of British Columbia. The term paper assignment was to estimate a demand equation for a selected food item. A demand equation typically considers consumption as a function of price of the good, prices of other goods that may serve as substitutes or complements and income. The Econ326 students observed that the media promotes interest in health issues that can influence changes in the dietary habits of the Canadian consumer. Over the time period 1975 to 1994, the Econ326 term papers discussed some evidence for a trend towards reduction in consumption of food items such as butter, eggs and beef and an increasing popularity for chicken.
79

This example focuses on the demand for beef. The data set for beef demand in Canada was collected from the CANSIM Statistics Canada data base. Two features of the data set are:
1. The price variables are in the form of index numbers. 2. The consumption and income variables are measured in per capita terms. These variables were constructed by dividing the aggregate amount by total population in Canada.

The SHAZAM commands (filename: BEEF.SHA) below estimate a linear demand equation and a loglinear demand equation. A trend variable is included in the demand equations to allow for changes in consumer preferences.

SAMPLE 1 20 READ (BEEF.txt) YEAR BEEF PBEEF PCHKN INCOME PFOOD PDFL * Convert data to real terms GENR RPBEEF=100*PBEEF/PFOOD GENR RPCHKN=100*PCHKN/PFOOD GENR RINCOME=100*INCOME/PDFL * Generate a time trend GENR TREND=TIME(0) * Estimate a linear demand equation OLS BEEF RPBEEF RPCHKN RINCOME TREND * Transform data to logarithms GENR LBEEF=LOG(BEEF) GENR LPBEEF=LOG(RPBEEF) GENR LPCHKN=LOG(RPCHKN) GENR LINCOME=LOG(RINCOME) * Estimate a log-linear demand equation OLS LBEEF LPBEEF LPCHKN LINCOME TREND / LOGLOG STOP

In the above commands, the first task is to prepare the data set in a form that is suitable for the regression analysis. It is of interest to express the price and income variables in real terms. To accomplish this the price variables are divided by the consumer price index for food and the income variable is divided by the implicit GDP price index. From economic theory, we expect that the coefficient for the price of beef (RPBEEF) should be negative. That is, as the price of beef increases the consumption of beef will decline. The coefficient for income (RINCOME) could be positive for a "normal" good or negative for an "inferior" good. An income elasticity less than one suggests that beef is not a luxury item whereas an elasticity greater than one gives evidence that beef is a luxury item. The coefficient for the price of chicken (RPCHKN) measures a cross-price effect and the sign may be:
postive, if chicken and beef are substitutes; negative, if chicken and beef are complements; zero, if chicken and beef are unrelated.

The SHAZAM output can be viewed. The estimation results for the linear demand equation are summarized as follows:
Variable name Estimated t-Statistic Estimated 80

Coefficient

Elasticity at Means
-6.78 0.30 -0.50 -0.53 0.05 -0.12

RPBEEF (price of beef) RPCHKN (price of chicken) RINCOME (per capita income) TREND (time trend)

-0.325 0.030 -0.001 -1.468

-5.28 not applicable

Over the sample period 1975 to 1994, average annual Canadian per capita beef consumption was 62.7 pounds. (On the SHAZAM OLS estimation output this number is reported as: MEAN OF DEPENDENT VARIABLE). The estimation results show that the estimated coefficient for the time trend variable is negative and statistically significant. Holding all else constant, the estimated decline in per capita consumption of beef is 1.5 pounds per year. The estimated coefficient for the price of beef is negative and the t-statistic suggests that the coefficient is significantly different from zero. This result agrees with our a priori expectations. The estimated coefficients for price of chicken and income do not appear to be significantly different from zero. This does not necessarily suggest that price of chicken and income are irrelevant to the analysis of the demand for beef in Canada. The result may reflect that the data does not have sufficient variability to produce precise estimates. In this example, high correlation between the income and the trend variable may erode our ability to get precise coefficient estimates. For reporting purposes, the equation should be presented as above since all the explanatory variables have a role in economic theory. Variables should not be excluded merely because they have large standard errors. Omitting a relevant variable may lead to biased estimators for the remaining coefficients. The estimation results for the linear demand equation can be compared with the estimation results for the log-linear demand equation. For the log-linear equation, the estimated coefficient on the time trend variable is -0.0276 and its estimated standard error is 0.003686. The interpretation of this result is that, for the sample period 1975 to 1994 and holding all other variables constant, the percentage annual rate of decline in per capita beef consumption is:
100 (exp{-0.0276 - (0.003686)2/2} - 1) = -2.72 %

[SHAZAM Guide home]

SHAZAM output

|_SAMPLE 1 20 |_READ (BEEF.txt) YEAR BEEF PBEEF PCHKN INCOME PFOOD PDFL UNIT 88 IS NOW ASSIGNED TO: BEEF.txt 7 VARIABLES AND 20 OBSERVATIONS STARTING AT OBS 1

81

|_* Convert data to real terms |_GENR RPBEEF=100*PBEEF/PFOOD |_GENR RPCHKN=100*PCHKN/PFOOD |_GENR RINCOME=100*INCOME/PDFL |_* Generate a time trend |_GENR TREND=TIME(0) |_* Estimate a linear demand equation |_OLS BEEF RPBEEF RPCHKN RINCOME TREND OLS ESTIMATION 20 OBSERVATIONS DEPENDENT VARIABLE = BEEF ...NOTE..SAMPLE RANGE SET TO: 1, 20 R-SQUARE = .9778 R-SQUARE ADJUSTED = .9718 VARIANCE OF THE ESTIMATE-SIGMA**2 = 2.7781 STANDARD ERROR OF THE ESTIMATE-SIGMA = 1.6668 SUM OF SQUARED ERRORS-SSE= 41.672 MEAN OF DEPENDENT VARIABLE = 62.740 LOG OF THE LIKELIHOOD FUNCTION = -35.7197 VARIABLE NAME RPBEEF RPCHKN RINCOME TREND CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 15 DF -.32517 .4794E-01 -6.783 .30237E-01 .1004 .3013 -.58792E-03 .1185E-02 -.4962 -1.4681 .2778 -5.284 115.78 15.73 7.361 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 -.868 -.3167 -.5333 .767 .078 .0153 .0501 .627 -.127 -.0786 -.1164 .000 -.807 -.8744 -.2457 .000 .885 .0000 1.8454

|_* Transform data to logarithms |_GENR LBEEF=LOG(BEEF) |_GENR LPBEEF=LOG(RPBEEF) |_GENR LPCHKN=LOG(RPCHKN) |_GENR LINCOME=LOG(RINCOME) |_* Estimate a log-linear demand equation |_OLS LBEEF LPBEEF LPCHKN LINCOME TREND / LOGLOG OLS ESTIMATION 20 OBSERVATIONS DEPENDENT VARIABLE = LBEEF ...NOTE..SAMPLE RANGE SET TO: 1, 20 R-SQUARE = .9815 R-SQUARE ADJUSTED = .9765 VARIANCE OF THE ESTIMATE-SIGMA**2 = .54403E-03 STANDARD ERROR OF THE ESTIMATE-SIGMA = .23324E-01 SUM OF SQUARED ERRORS-SSE= .81604E-02 MEAN OF DEPENDENT VARIABLE = 4.1277 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -32.8912 VARIABLE NAME LPBEEF LPCHKN LINCOME TREND CONSTANT |_STOP ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 15 DF -.48892 .7165E-01 -6.823 -.47129E-01 .1461 -.3226 .14716 .1936 .7602 -.27609E-01 .3686E-02 -7.490 5.5135 1.799 3.065 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 -.870 -.2965 -.4889 .751 -.083 -.0147 -.0471 .459 .193 .1065 .1472 .000 -.888 -1.0729 -.0276 .008 .621 .0000 5.5135

Dummy Variables

82

Dummy variables are very useful for capturing a variety of qualitative effects. They typically have the value 0 or 1 and so possibly a better name is "binary variable". They can be included as explanatory variables in the regression equation and the estimated coefficients and standard errors can be used in hypothesis testing. Dummy variables that measure individual attributes may be coded in the data file and these can be assigned variable names and loaded into SHAZAM with the READ command. Alternatively, it may be convenient to create the required dummy variables with SHAZAM commands. Examples

Modelling structural change Capturing seasonal effects Dummy variables in models with a log-transformed dependent variable

Note: When dummy variables are used as dependent variables then analysis with the probit or logit model is required.

[SHAZAM Guide home]

Modelling structural change

Consider measuring the impact of seat belt legislation on traffic fatalities. Monthly data is available for 120 observations on traffic fatalities (D) and auto fuel consumption (C). Suppose that seat belt legislation became effective at observation 80. To estimate the impact of this legislation define a dummy variable as follows:
B = 0 B = 1 for observations 1 to 79 for observations 80 to 120

The following SHAZAM commands create the dummy variable and run the regression.

SAMPLE 1 120 READ (file) D C * Create the dummy variable GENR B=0 SAMPLE 80 120 GENR B=1 * SAMPLE 1 120 * List the data and check that it is set up correctly. PRINT D C B * Get the OLS estimation results OLS D C B STOP

83

Note that the command GENR B=0 first initializes the dummy variable to all zeroes. Then the SAMPLE command is used to set the sample to the period of structural change and the dummy variable is then set to 1 for this period. The dummy variable is now created and the OLS estimation is then run over the entire sample period of the data. In SHAZAM there is often more than one way of doing things. An equivalent way of creating the dummy variable is illustrated in the next list of commands.

SAMPLE 1 120 READ (file) D C GENR B=DUM(TIME(0)-79) OLS D C B STOP

The dummy variable is created in a single command by using the DUM and TIME functions on the GENR command. The TIME(0) function creates an observation index with values 1,2,3,. . . . The DUM function is set to 1 when the argument is > 0 (that is, positive values) and is set to 0 otherwise. In this example the argument of the DUM function is TIME(0)-79. This will have positive values for observation numbers that are greater than or equal to 80.
Diagnostic Testing Testing for Heteroskedasticity Testing for Autocorrelation Testing for Structural Stability - the Chow Test

Testing for Heteroskedasticity

Heteroskedasticity refers to unequal variance in the regression errors. Heteroskedasticity can arise in a variety of ways and a number of tests have been proposed. Typically a test is designed to test the null hypothesis of homoskedasticity (equal error variance) against some specific alternative heteroskedasticity specification. Examples

Heteroskedasticity as a function of the explanatory variables Heteroskedasticity as a sample partition More tests for heteroskedasticity

A Note on the Spelling of Heteroskedasticity A 'c' is often used instead of a 'k' in the spelling of heteroskedasticity. The research by McCulloch [1985] concludes that the word is derived from Greek roots and the proper English spelling is with a 'k'. Reference: J. Huston McCulloch, "On Heteros*edasticity", Econometrica, Vol. 53, 1985, p. 483.

84

[SHAZAM Guide home]

Heteroskedasticity as a function of the explanatory variables

Test statistics are reported with the SHAZAM commands:


OLS . . . DIAGNOS / HET

The DIAGNOS command uses the results from the immediately preceding OLS command to generate diagnostic tests. The HET option computes and reports tests for heteroskedasticity. These tests are obtained by using a function of the OLS residuals et as a dependent variable in an auxiliary regression. A number of alternative auxiliary regressions have been proposed as follows.

where Xt is a (K x 1) vector of observations on the explanatory variables (including the constant) for t=1,...,N. SSE is the sum of squared errors from the initial OLS regression. R2 and SSR are the Rsquare and the regression sum of squares respectively from the auxiliary regression. Note that the final two auxiliary regressions include cross-products of the explanatory variables as regressors. Therefore, the application requires at least 2 explanatory variables. The final two test statistics are not reported for regressions that specify one explanatory variable.
85

In "large samples" the test statistics have a chi-square distribution with degrees of freedom as given in the D.F. column. This means that critical values can be obtained from tables for the chi-square distribution, but the comparison is approximate only. References for the various test statistics are given in the SHAZAM User's Reference Manual. The ARCH (AutoRegressive Conditional Heteroskedasticity) test is in a different category from the others. This test has specific application to time series data and detects successive periods of volatility followed by successive periods of stability. This type of behaviour has been found in financial time series data. Example Heteroskedasticity has been found to be a feature of cross-section studies on household expenditure. This example, from Griffiths, Hill and Judge, uses a data set on household expenditure. The SHAZAM commands are:
SAMPLE 1 40 READ (GHJ.txt) FOOD INCOME OLS FOOD INCOME DIAGNOS / HET STOP

The SHAZAM output can be inspected. The results from the DIAGNOS / HET command are:
HETEROSKEDASTICITY TESTS CHI-SQUARE TEST STATISTIC E**2 ON YHAT: 12.042 E**2 ON YHAT**2: 13.309 E**2 ON LOG(YHAT**2): 10.381 E**2 ON LAG(E**2) ARCH TEST: 2.565 LOG(E**2) ON X (HARVEY) TEST: 4.358 ABS(E) ON X (GLEJSER) TEST: 11.611 E**2 ON X TEST: KOENKER(R2): 12.042 B-P-G (SSR) : 11.283 E**2 ON X X**2 (WHITE) TEST: KOENKER(R2): 14.582 B-P-G (SSR) : 13.662 D.F. 1 1 1 1 1 1 1 1 2 2 P-VALUE 0.00052 0.00026 0.00127 0.10926 0.03683 0.00066 0.00052 0.00078 0.00068 0.00108

The 5% critical value from a chi-square distribution with 1 degree of freedom is 3.84. With the exception of the ARCH test, all test statistics exceed this value and so there is evidence for heteroskedasticity in the estimated residuals. Of course, the ARCH test is of no relevance to this example since the data is cross-section data and the ARCH test has application to time series data. Note that the first test statistic and the seventh test statistic are identical. As an exercise the user should verify that these tests are always identical when the regression contains one explanatory variable.

[SHAZAM Guide home]

86

SHAZAM output with tests for heteroskedasticity The OLS estimation results are described in further detail in Griffiths, Hill and Judge [1993, Section 5.3.2].

|_SAMPLE 1 40 |_READ (GHJ.txt) FOOD INCOME UNIT 88 IS NOW ASSIGNED TO: GHJ.txt 2 VARIABLES AND 40 OBSERVATIONS STARTING AT OBS |_OLS FOOD INCOME OLS ESTIMATION 40 OBSERVATIONS DEPENDENT VARIABLE = FOOD ...NOTE..SAMPLE RANGE SET TO: 1, 40 R-SQUARE = .3171 R-SQUARE ADJUSTED = .2991 VARIANCE OF THE ESTIMATE-SIGMA**2 = 46.853 STANDARD ERROR OF THE ESTIMATE-SIGMA = 6.8449 SUM OF SQUARED ERRORS-SSE= 1780.4 MEAN OF DEPENDENT VARIABLE = 23.595 LOG OF THE LIKELIHOOD FUNCTION = -132.672 VARIABLE NAME INCOME CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 38 DF .23225 .5529E-01 4.200 7.3832 4.008 1.842 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .563 .5631 .6871 .073 .286 .0000 .3129 1

|_DIAGNOS / HET DEPENDENT VARIABLE = FOOD 40 OBSERVATIONS REGRESSION COEFFICIENTS 0.232253330328 7.38321754308 HETEROSKEDASTICITY TESTS CHI-SQUARE TEST STATISTIC E**2 ON YHAT: 12.042 E**2 ON YHAT**2: 13.309 E**2 ON LOG(YHAT**2): 10.381 E**2 ON LAG(E**2) ARCH TEST: 2.565 LOG(E**2) ON X (HARVEY) TEST: 4.358 ABS(E) ON X (GLEJSER) TEST: 11.611 E**2 ON X TEST: KOENKER(R2): 12.042 B-P-G (SSR) : 11.283 E**2 ON X X**2 (WHITE) TEST: KOENKER(R2): 14.582 B-P-G (SSR) : 13.662 |_STOP D.F. 1 1 1 1 1 1 1 1 2 2 P-VALUE 0.00052 0.00026 0.00127 0.10926 0.03683 0.00066 0.00052 0.00078 0.00068 0.00108

Testing for Autocorrelation

The following options on the OLS command can be used to obtain test statistics for detecting the presence of autocorrelation in the residuals.
87

RSTAT

Lists residual statistics including the Durbin-Watson statistic.

DWPVALUE Computes the p-value for the Durbin-Watson test statistic. DLAG

Computes Durbin's h statistic as a test for AR(1) errors when lagged dependent variables are included as regressors. The one-period lagged dependent variable must be listed as the first explanatory variable.

Examples

Using the Durbin-Watson test Using Durbin's h test

Appendix

The Distribution of the Durbin-Watson Test Statistic

[SHAZAM Guide home]

Using the Durbin-Watson test

The Durbin-Watson test statistic is designed for detecting errors that follow a first-order autoregressive process. This statistic also fills an important role as a general test of model misspecification. See, for example, the discussion in Gujarati [1995, pp. 462-464]. The DWPVALUE option on the OLS command computes a p-value for the Durbin-Watson test statistic. Suppose the Durbin-Watson test statistic, d, has a calculated value of DW. For a test of the null hypothesis of no autocorrelation in the errors against the alternative hypothesis of positive autocorrelation the p-value is: p-value = P(d < DW) The computation of a p-value is useful if the Durbin-Watson test statistic falls in the inconclusive range given in statistical tables. If the p-value is less than a selected level of significance (say 0.05) then there is evidence to reject the null hypothesis. If the alternative hypothesis of interest is negative autocorrelation then the p-value is: p-value = P(d > DW) = 1 - P(d < DW) Following the OLS / DWPVALUE command the p-value for the Durbin-Watson test is available in the temporary variable $CDF. Therefore, when testing for negative autocorrelation, a p-value can be computed with the commands:
OLS . . . / DWPVALUE GEN1 PVAL=1-$CDF PRINT PVAL

88

Example This example uses the Theil textile data set. The SHAZAM commands (filename: DW.SHA) below first estimate an equation with PRICE as the explanatory variable. But economic theory suggests that INCOME is an important variable in a demand equation. A statistical result is that if important variables are omitted from the regression then the OLS estimator is biased. The second OLS regression is the preferred model specification that includes both PRICE and INCOME as explanatory variables.
SAMPLE 1 17 READ (THEIL.txt) YEAR CONSUME INCOME PRICE OLS CONSUME PRICE / RSTAT DWPVALUE * Now include the variable INCOME in the regression equation OLS CONSUME INCOME PRICE / RSTAT DWPVALUE * Compute a p-value for testing for negative autocorrelation GEN1 PVAL=1-$CDF PRINT PVAL STOP

The SHAZAM output can be inspected. The first OLS regression reports the results:
DURBIN-WATSON STATISTIC = 1.19071 DURBIN-WATSON POSITIVE AUTOCORRELATION TEST P-VALUE = NEGATIVE AUTOCORRELATION TEST P-VALUE = 0.018346 0.981655

The estimation uses 17 observations and there are 2 estimated coefficients (including the intercept parameter). If we ignore the p-value and rely on tables printed at the end of textbooks we find that the lower and upper critical values are 1.133 and 1.381 (for a 5% significance level) and 0.874 and 1.102 (for a 1% significance level). When compared with the reported Durbin-Watson statistic the finding is that at a 5% level there is evidence for positive autocorrelation but at the 1% level the null hypothesis of no autocorrelation is not rejected. The computed p-value verifies this conclusion. When the variable INCOME is added to the regression the SHAZAM estimation results report:
DURBIN-WATSON STATISTIC = 2.01855 DURBIN-WATSON POSITIVE AUTOCORRELATION TEST P-VALUE = NEGATIVE AUTOCORRELATION TEST P-VALUE = 0.301270 0.698730

By inspecting the p-value, the conclusion is that when both PRICE and INCOME are included in the regression there is no evidence to reject the null hypothesis of no autocorrelation in the errors. The regression equation that omitted INCOME showed evidence for autocorrelated errors. However, this appears to reflect that an important variable has been omitted - rather than a need to correct for autocorrelation. That is, the omitted variable INCOME is highly autocorrelated and when this variable is included in the regression (as economic theory would typically suggest) the autocorrelation in the residuals disappears.

[Back to Top]

[SHAZAM Guide home]

89

SHAZAM output with Durbin-Watson test statistics

|_SAMPLE 1 17 |_READ (THEIL.txt) YEAR CONSUME INCOME PRICE UNIT 88 IS NOW ASSIGNED TO: THEIL.txt 4 VARIABLES AND 17 OBSERVATIONS STARTING AT OBS |_OLS CONSUME PRICE / RSTAT DWPVALUE OLS ESTIMATION 17 OBSERVATIONS DEPENDENT VARIABLE = CONSUME ...NOTE..SAMPLE RANGE SET TO: 1, 17 DURBIN-WATSON STATISTIC = 1.19071 DURBIN-WATSON POSITIVE AUTOCORRELATION TEST P-VALUE = NEGATIVE AUTOCORRELATION TEST P-VALUE = R-SQUARE = 0.8961 R-SQUARE ADJUSTED = 0.8892 VARIANCE OF THE ESTIMATE-SIGMA**2 = 61.594 STANDARD ERROR OF THE ESTIMATE-SIGMA = 7.8482 SUM OF SQUARED ERRORS-SSE= 923.91 MEAN OF DEPENDENT VARIABLE = 134.51 LOG OF THE LIKELIHOOD FUNCTION = -58.0829 VARIABLE NAME PRICE CONSTANT ESTIMATED STANDARD COEFFICIENT ERROR -1.3233 0.1163 235.49 9.079 T-RATIO 15 DF -11.38 25.94

0.018346 0.981655

PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS 0.000-0.947 -0.9466 -0.7508 0.000 0.989 0.0000 1.7 508

DURBIN-WATSON = 1.1907 VON NEUMANN RATIO = 1.2651 RHO = 0.38554 RESIDUAL SUM = 0.00000 RESIDUAL VARIANCE = 61.594 SUM OF ABSOLUTE ERRORS= 102.14 R-SQUARE BETWEEN OBSERVED AND PREDICTED = 0.8961 RUNS TEST: 6 RUNS, 9 POS, 0 ZERO, 8 NEG NORMAL STATISTIC = -1.7451 |_* Now include the variable INCOME in the regression equation |_OLS CONSUME INCOME PRICE / RSTAT DWPVALUE OLS ESTIMATION 17 OBSERVATIONS DEPENDENT VARIABLE = CONSUME ...NOTE..SAMPLE RANGE SET TO: 1, 17 DURBIN-WATSON STATISTIC = 2.01855 DURBIN-WATSON POSITIVE AUTOCORRELATION TEST P-VALUE = NEGATIVE AUTOCORRELATION TEST P-VALUE = R-SQUARE = 0.9513 R-SQUARE ADJUSTED = 0.9443 VARIANCE OF THE ESTIMATE-SIGMA**2 = 30.951 STANDARD ERROR OF THE ESTIMATE-SIGMA = 5.5634 SUM OF SQUARED ERRORS-SSE= 433.31 MEAN OF DEPENDENT VARIABLE = 134.51 LOG OF THE LIKELIHOOD FUNCTION = -51.6471 VARIABLE NAME INCOME PRICE CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 14 DF 1.0617 0.2667 3.981 -1.3830 0.8381E-01 -16.50 130.71 27.09 4.824 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS 0.001 0.729 0.2387 0.8129 0.000-0.975 -0.9893 -0.7846 0.000 0.790 0.0000 0.9718 0.301270 0.698730

90

DURBIN-WATSON = 2.0185 VON NEUMANN RATIO = 2.1447 RHO = -0.18239 RESIDUAL SUM = -0.53291E-14 RESIDUAL VARIANCE = 30.951 SUM OF ABSOLUTE ERRORS= 72.787 R-SQUARE BETWEEN OBSERVED AND PREDICTED = 0.9513 RUNS TEST: 7 RUNS, 9 POS, 0 ZERO, 8 NEG NORMAL STATISTIC = -1.2423 |_* Compute a p-value for testing for negative autocorrelation |_GEN1 PVAL=1-$CDF ..NOTE..CURRENT VALUE OF $CDF = 0.30127 |_PRINT PVAL PVAL 0.6987301 |_STOP

Testing for Structural Stability - the Chow Test


It may be of interest to test for stability of regression coefficients between two periods. A change in parameters between two periods is an indication of structural change. Following an OLS estimation, the Chow test statistic for structural change is reported with the commands:
OLS . . . DIAGNOS / CHOWONE=n1

where n1 is the number of observations in the first group. Alternatively, to get test statistics computed for every breakpoint in the data set the following commands can be used.
OLS . . . DIAGNOS / CHOWTEST

The computations required to obtain the Chow test statistic can be illustrated with SHAZAM commands. These commands give an example of programming in SHAZAM.

Example This example is from Exercise 8.35 of Gujarati [1995, pp. 279-280]. A data set on personal savings and income for the United States is available for the years 1970 to 1991. It is of interest to investigate if there is a significant change in the savings-income relationship for the period 1970-1980 and 19811991 (the Reagan-Bush presidency era). Gujarati suggests that either a linear or log-linear model may be used to estimate a savings-income relationship. The SHAZAM commands (filename: USECON.SHA) below estimate both a linear and loglinear model. After each OLS estimation a Chow test for structural change is computed.
SAMPLE 1 22 READ (USECON.txt) YEAR SAVINGS INCOME * Estimate the savings-income relationship OLS SAVINGS INCOME / RSTAT DWPVALUE

91

DIAGNOS / CHOWONE=11 * Now consider a log-linear relationship GENR LSAV=LOG(SAVINGS) GENR LINC=LOG(INCOME) OLS LSAV LINC / LOGLOG RSTAT DWPVALUE DIAGNOS / CHOWONE=11 STOP

The SHAZAM output can be viewed. For the linear savings-income function the Chow test statistic is reported as:
SEQUENTIAL CHOW AND GOLDFELD-QUANDT TESTS N1 N2 SSE1 SSE2 CHOW 11 11 1010.8 5103.5 20.371 PVALUE 0.000 G-Q 0.1981 2 AND DF2= DF1 9 18 DF2 PVALUE 9 0.012

CHOW TEST - F DISTRIBUTION WITH DF1=

For the log-linear savings-income function the Chow test statistic is reported as:
SEQUENTIAL CHOW AND GOLDFELD-QUANDT TESTS N1 N2 SSE1 SSE2 CHOW 11 11 0.11833 0.15911 13.923 PVALUE 0.000 G-Q 0.7437 2 AND DF2= DF1 9 18 DF2 PVALUE 9 0.333

CHOW TEST - F DISTRIBUTION WITH DF1=

The above SHAZAM output uses the following notation:


N1 N2 SSE1 SSE2 CHOW G-Q

no. of observations in group 1 no. of observations in group 2 sum of squared errors from a regression for group 1 sum of squared errors from a regression for group 2 the Chow test statistic the Goldfeld-Quandt test statistic for testing for equality of error variance in the 2 groups.
G-Q = (SSE1/DF1)/(SSE2/DF2)

By inspecting the output it can be seen that for both the linear and log-linear model the p-value reported for the Chow test statistic is less than 0.0005. This gives evidence to reject the null hypothesis of equality of regression coefficients in the 2 periods. It should be considered that any test statistic relies on some distributional assumptions. The derivation of the Chow test assumes that the errors have the same variance (homoskedasticity) in the 2 groups and the errors are independently distributed (that is, no autocorrelation). Are these assumptions reasonable for this example ? The SHAZAM output for the Chow test statistic also reports the Goldfeld Quandt test statistic for equal variance in the 2 groups. The above output shows that for both the linear and the log-linear model the calculated test statistic is less than 1. The p-value that is reported at the extreme right of the SHAZAM output is the p-value for a test of the null hypothesis of equal variance against the alternative hypothesis of larger variance in the second group compared to the first group. The results
92

show that there is evidence for heteroskedasticity in the linear model. However, the homoskedasticity assumption appears reasonable for the log-linear model. For both models the Durbin-Watson test statistic rejects the null hypothesis of no autocorrelation in the errors. Therefore, there is evidence for model misspecification. It may be reasonable to consider that savings behaviour is related to savings in the past. This can be recognized by including a lagged dependent variable as an explanatory variable in the regression equation. The next list of SHAZAM commands show the estimation of a log-linear equation with a lagged dependent variable.
SAMPLE 1 22 READ (USECON.txt) YEAR SAVINGS INCOME GENR LSAV=LOG(SAVINGS) GENR LINC=LOG(INCOME) * Estimate a log-linear model with a lagged dependent variable GENR LSAVL1=LAG(LSAV) * Adjust the sample period SAMPLE 2 22 OLS LSAV LSAVL1 LINC / LOGLOG DLAG DIAGNOS / CHOWONE=11 STOP

Note that the lagged dependent variable is included as the first explanatory variable. The DLAG option is used to obtain Durbin's h test as a test for autocorrelation when the model includes a lagged dependent variable. The SHAZAM output can be viewed. The results from the DIAGNOS command are:
SEQUENTIAL CHOW AND GOLDFELD-QUANDT TESTS N1 N2 SSE1 SSE2 CHOW 11 10 0.13517 0.15749 1.8444 PVALUE 0.182 G-Q 0.7510 3 AND DF2= DF1 8 15 DF2 PVALUE 7 0.346

CHOW TEST - F DISTRIBUTION WITH DF1=

The Chow test statistic does not reject the null hypothesis of parameter stability and the GoldfeldQuandt test statistic shows no evidence of heteroskedasticity. Durbin's h test statistic has the value 0.077 and so there is no evidence for autocorrelation in the errors. The conclusion is that the log-linear model with a lagged dependent variable reveals no evidence for a structural change during the Reagan-Bush presidency era.

[SHAZAM Guide home] SHAZAM commands for computing the Chow test statistic The commands below show an example of programming in SHAZAM. The commands compute a Chow test statistic for the example given above. A p-value for the test statistic is also computed. The computations should replicate the Chow test statistic that is reported with the DIAGNOS / CHOWTEST command.
93

SAMPLE 1 22 READ (USECON.txt) YEAR SAVINGS INCOME * Suppress output SET NOOUTPUT * OLS estimation for group 1 SAMPLE 1 11 OLS SAVINGS INCOME * The sum of squared errors is available in the temporary variable $SSE GEN1 SSE1=$SSE GEN1 N1=$N * OLS estimation for group 2 SAMPLE 12 22 OLS SAVINGS INCOME GEN1 SSE2=$SSE GEN1 N2=$N * OLS estimation for the complete sample. SAMPLE 1 22 OLS SAVINGS INCOME GEN1 SSEA=$SSE GEN1 K=$K * Compute the Chow test statistic GEN1 SSEB=SSE1+SSE2 GEN1 DFDEN=N1+N2-2*K GEN1 CHOW=((SSEA-SSEB)/K)/(SSEB/DFDEN) * Get the p-value SAMPLE 1 1 DISTRIB CHOW / TYPE=F DF1=K DF2=DFDEN CDF=CDF1 GEN1 PVAL=1-CDF1 PRINT CHOW PVAL STOP

Note that following model estimation SHAZAM temporary variables are available with some results. These variables start with the $ character. The above commands make use of the following temporary variables available after the OLS command. $N The number of observations used in the OLS regression $SSE The sum of squared errors $K The number of coefficients

[SHAZAM Guide home] SHAZAM output


|_SAMPLE 1 22 |_READ (USECON.txt) YEAR SAVINGS INCOME UNIT 88 IS NOW ASSIGNED TO: USECON.txt 3 VARIABLES AND 22 OBSERVATIONS STARTING AT OBS |_* Estimate the savings-income relationship |_OLS SAVINGS INCOME / RSTAT DWPVALUE OLS ESTIMATION 22 OBSERVATIONS DEPENDENT VARIABLE = SAVINGS ...NOTE..SAMPLE RANGE SET TO: 1, 22 DURBIN-WATSON STATISTIC = 0.54879

94

DURBIN-WATSON P-VALUE =

0.000005

R-SQUARE = 0.6396 R-SQUARE ADJUSTED = 0.6216 VARIANCE OF THE ESTIMATE-SIGMA**2 = 997.69 STANDARD ERROR OF THE ESTIMATE-SIGMA = 31.586 SUM OF SQUARED ERRORS-SSE= 19954. MEAN OF DEPENDENT VARIABLE = 136.91 LOG OF THE LIKELIHOOD FUNCTION = -106.128 VARIABLE NAME INCOME CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 20 DF 0.31461E-01 0.5281E-02 5.958 57.636 14.91 3.865 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS 0.000 0.800 0.7998 0.5790 0.001 0.654 0.0000 0.4210

DURBIN-WATSON = 0.5488 VON NEUMANN RATIO = 0.5749 RHO = 0.70933 RESIDUAL SUM = -0.49738E-13 RESIDUAL VARIANCE = 997.69 SUM OF ABSOLUTE ERRORS= 536.24 R-SQUARE BETWEEN OBSERVED AND PREDICTED = 0.6396 RUNS TEST: 5 RUNS, 9 POS, 0 ZERO, 13 NEG NORMAL STATISTIC = -3.0039 |_DIAGNOS / CHOWONE=11 DEPENDENT VARIABLE = SAVINGS 22 OBSERVATIONS REGRESSION COEFFICIENTS 0.314609350421E-01 57.6356858451 SEQUENTIAL CHOW AND GOLDFELD-QUANDT TESTS N1 N2 SSE1 SSE2 CHOW 11 11 1010.8 5103.5 20.371 PVALUE 0.000 G-Q 0.1981 2 AND DF2= DF1 9 18 DF2 PVALUE 9 0.012

CHOW TEST - F DISTRIBUTION WITH DF1= |_* Now consider a log-linear relationship |_GENR LSAV=LOG(SAVINGS) |_GENR LINC=LOG(INCOME) |_OLS LSAV LINC / LOGLOG RSTAT DWPVALUE

OLS ESTIMATION 22 OBSERVATIONS DEPENDENT VARIABLE = LSAV ...NOTE..SAMPLE RANGE SET TO: 1, 22 DURBIN-WATSON STATISTIC DURBIN-WATSON P-VALUE = = 0.67040 0.000040

R-SQUARE = 0.8095 R-SQUARE ADJUSTED = 0.8000 VARIANCE OF THE ESTIMATE-SIGMA**2 = 0.35331E-01 STANDARD ERROR OF THE ESTIMATE-SIGMA = 0.18797 SUM OF SQUARED ERRORS-SSE= 0.70663 MEAN OF DEPENDENT VARIABLE = 4.8416 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -99.9099 VARIABLE ESTIMATED STANDARD T-RATIO NAME COEFFICIENT ERROR 20 DF LINC 0.66122 0.7172E-01 9.219 CONSTANT -0.24110 0.5528 -0.4362 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS 0.000 0.900 0.8997 0.6612 0.667-0.097 0.0000 -0.2411

DURBIN-WATSON = 0.6704 VON NEUMANN RATIO = 0.7023 RHO = 0.64948 RESIDUAL SUM = 0.19429E-15 RESIDUAL VARIANCE = 0.35331E-01 SUM OF ABSOLUTE ERRORS= 3.3446 R-SQUARE BETWEEN OBSERVED AND PREDICTED = 0.8095 R-SQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED = 0.6859 RUNS TEST: 5 RUNS, 11 POS, 0 ZERO, 11 NEG NORMAL STATISTIC = -3.0585

95

|_DIAGNOS / CHOWONE=11 DEPENDENT VARIABLE = LSAV 22 OBSERVATIONS REGRESSION COEFFICIENTS 0.661217901001 -0.241102497959 SEQUENTIAL CHOW AND GOLDFELD-QUANDT TESTS N1 N2 SSE1 SSE2 CHOW 11 11 0.11833 0.15911 13.923 |_STOP PVALUE 0.000 G-Q 0.7437 2 AND DF2= DF1 9 18 DF2 PVALUE 9 0.333

CHOW TEST - F DISTRIBUTION WITH DF1=

[SHAZAM Guide home] SHAZAM output


|_SAMPLE 1 22 |_READ (USECON.txt) YEAR SAVINGS INCOME UNIT 88 IS NOW ASSIGNED TO: USECON.txt 3 VARIABLES AND 22 OBSERVATIONS STARTING AT OBS

|_GENR LSAV=LOG(SAVINGS) |_GENR LINC=LOG(INCOME) |_* Estimate a log-linear model with a lagged dependent variable |_GENR LSAVL1=LAG(LSAV) ..NOTE.LAG VALUE IN UNDEFINED OBSERVATIONS SET TO ZERO |_* Adjust the sample period |_SAMPLE 2 22 |_OLS LSAV LSAVL1 LINC / LOGLOG DLAG OLS ESTIMATION 21 OBSERVATIONS DEPENDENT VARIABLE = LSAV ...NOTE..SAMPLE RANGE SET TO: 2, 22 R-SQUARE = 0.8689 R-SQUARE ADJUSTED = 0.8543 VARIANCE OF THE ESTIMATE-SIGMA**2 = 0.22257E-01 STANDARD ERROR OF THE ESTIMATE-SIGMA = 0.14919 SUM OF SQUARED ERRORS-SSE= 0.40062 MEAN OF DEPENDENT VARIABLE = 4.8792 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -90.6883 VARIABLE NAME LSAVL1 LINC CONSTANT ESTIMATED COEFFICIENT 0.62422 0.20643 0.27423 STANDARD ERROR 0.1767 0.1360 0.4841 T-RATIO 18 DF 3.532 1.517 0.5665 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS 0.002 0.640 0.6673 0.6242 0.147 0.337 0.2867 0.2064 0.578 0.132 0.0000 0.2742

DURBIN-WATSON = 1.9737 VON NEUMANN RATIO = 2.0724 RHO = 0.00984 RESIDUAL SUM = 0.17847E-13 RESIDUAL VARIANCE = 0.22257E-01 SUM OF ABSOLUTE ERRORS= 2.2924 R-SQUARE BETWEEN OBSERVED AND PREDICTED = 0.8689 R-SQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED = 0.8118 RUNS TEST: 8 RUNS, 11 POS, 0 ZERO, 10 NEG NORMAL STATISTIC = -1.5603 DURBIN H STATISTIC (ASYMPTOTIC NORMAL) = 0.76882E-01 |_DIAGNOS / CHOWONE=11 DEPENDENT VARIABLE = LSAV 21 OBSERVATIONS

96

REGRESSION COEFFICIENTS 0.624221564206 0.206427954104 SEQUENTIAL CHOW AND GOLDFELD-QUANDT TESTS N1 N2 SSE1 SSE2 CHOW 11 10 0.13517 0.15749 1.8444 |_STOP Generalized Least Squ ares Estimation of models with Heteroskedastic errors Estimation of models with Autoregressive errors Pooling Time-Series Cross-Section Data Sets of Linear Equations - SUR Estimation

0.274228157820 PVALUE 0.182 G-Q 0.7510 3 AND DF2= DF1 8 15 DF2 PVALUE 7 0.346

CHOW TEST - F DISTRIBUTION WITH DF1=

Estimation of Models with Heteroskedastic errors

With heteroskedastic errors the OLS estimator still gives an unbiased estimator. That is, the proof that the OLS estimator is unbiased does not use the heteroskedasticity assumption. The heteroskedasticity affects the results in two ways:
1. The OLS estimator is not efficient (it does not have minimum variance). 2. The estimators of the variances are biased. The standard errors reported on the SHAZAM output do not make any adjustment for the heteroskedasticity - so incorrect conclusions may be made if they are used in hypothesis tests.

A number of solution approaches can be considered as follows.

1. The observed heteroskedasticity in the residuals may be an indication of model misspecification


such as incorrect functional form. For example, a log-log model may reduce heteroskedasticity compared to a linear model. Gujarati [1995, p.386] comments that the log transformation compresses the scales in which the variables are measured.

2. If the model specification is considered adequate then it may be useful to focus on just correcting
the second problem above. The HETCOV option on the OLS command will compute White's (due to Hal White) heteroskedasticity-consistent covariance matrix of the parameter estimates. The SHAZAM OLS estimation output will then report the standard errors that are adjusted for heteroskedastic errors. These may be larger or smaller than the uncorrected standard errors. An application of heteroskedasticity-consistent standard errors is available.

3. To obtain an efficient estimator an estimation method is weighted least squares (WLS). This is a
special case of generalized least squares (GLS). The application of this method requires specifying a functional form for the error variance. Examples of Weighted Least Squares
1. Variance with pre-set form 2. 2-step Estimation 97

Note: Another estimation approach that has application to the estimation of models with heteroskedastic errors is maximum likelihood estimation. This is available in SHAZAM with the HET command. This command also has options for the estimation of time series models with ARCH and GARCH errors. Users must first study the principle of maximum likelihood as well as nonlinear optimisation methods before attempting to use the HET command.

[SHAZAM Guide home]

Computing Heteroskedasticity-Consistent Standard Errors

The HETCOV option on the OLS command is used to obtain standard errors that are corrected for some unknown form of heteroskedasticity. Example This example uses the Griffiths, Hill and Judge data set on household expenditure that was analyzed in the section on testing for heteroskedasticity. The SHAZAM commands (filename: OLSHET.SHA) below compute both the OLS standard errors and the heteroskedasticity corrected standard errors.
SAMPLE 1 40 READ (GHJ.txt) FOOD INCOME OLS FOOD INCOME * Test for heteroskedasticity DIAGNOS / HET * Get the heteroskedasticity corrected standard errors OLS FOOD INCOME / HETCOV STOP

The SHAZAM output can be viewed. The OLS standard errors are reported in the following output:
VARIABLE NAME INCOME CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 38 DF .23225 .5529E-01 4.200 7.3832 4.008 1.842 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .563 .5631 .6871 .073 .286 .0000 .3129

When the HETCOV option is specified the estimation results are:


VARIABLE NAME INCOME CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 38 DF .23225 .6911E-01 3.361 7.3832 4.292 1.720 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .002 .479 .5631 .6871 .094 .269 .0000 .3129

Both regressions report identical OLS estimated coefficients. But the second regression reports larger standard errors. So hypothesis testing that relies on the results of the first regression may give misleading results.

98

[SHAZAM Guide home]

SHAZAM output

|_SAMPLE 1 40 |_READ (GHJ.txt) FOOD INCOME UNIT 88 IS NOW ASSIGNED TO: GHJ.txt 2 VARIABLES AND 40 OBSERVATIONS STARTING AT OBS |_OLS FOOD INCOME OLS ESTIMATION 40 OBSERVATIONS DEPENDENT VARIABLE = FOOD ...NOTE..SAMPLE RANGE SET TO: 1, 40 R-SQUARE = .3171 R-SQUARE ADJUSTED = .2991 VARIANCE OF THE ESTIMATE-SIGMA**2 = 46.853 STANDARD ERROR OF THE ESTIMATE-SIGMA = 6.8449 SUM OF SQUARED ERRORS-SSE= 1780.4 MEAN OF DEPENDENT VARIABLE = 23.595 LOG OF THE LIKELIHOOD FUNCTION = -132.672 VARIABLE NAME INCOME CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 38 DF .23225 .5529E-01 4.200 7.3832 4.008 1.842 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 .563 .5631 .6871 .073 .286 .0000 .3129 1

|_* Test for heteroskedasticity |_DIAGNOS / HET DEPENDENT VARIABLE = FOOD 40 OBSERVATIONS REGRESSION COEFFICIENTS 0.232253330328 7.38321754308 HETEROSKEDASTICITY TESTS CHI-SQUARE TEST STATISTIC E**2 ON YHAT: 12.042 E**2 ON YHAT**2: 13.309 E**2 ON LOG(YHAT**2): 10.381 E**2 ON LAG(E**2) ARCH TEST: 2.565 LOG(E**2) ON X (HARVEY) TEST: 4.358 ABS(E) ON X (GLEJSER) TEST: 11.611 E**2 ON X TEST: KOENKER(R2): 12.042 B-P-G (SSR) : 11.283 E**2 ON X X**2 (WHITE) TEST: KOENKER(R2): 14.582 B-P-G (SSR) : 13.662 |_STOP D.F. 1 1 1 1 1 1 1 1 2 2 P-VALUE 0.00052 0.00026 0.00127 0.10926 0.03683 0.00066 0.00052 0.00078 0.00068 0.00108

|_* Get the heteroskedasticity corrected standard errors |_OLS FOOD INCOME / HETCOV OLS ESTIMATION 40 OBSERVATIONS DEPENDENT VARIABLE = FOOD

99

...NOTE..SAMPLE RANGE SET TO:

1,

40

USING HETEROSKEDASTICITY-CONSISTENT COVARIANCE MATRIX R-SQUARE = .3171 R-SQUARE ADJUSTED = .2991 VARIANCE OF THE ESTIMATE-SIGMA**2 = 46.853 STANDARD ERROR OF THE ESTIMATE-SIGMA = 6.8449 SUM OF SQUARED ERRORS-SSE= 1780.4 MEAN OF DEPENDENT VARIABLE = 23.595 LOG OF THE LIKELIHOOD FUNCTION = -132.672 VARIABLE NAME INCOME CONSTANT |_STOP ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 38 DF .23225 .6911E-01 3.361 7.3832 4.292 1.720 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .002 .479 .5631 .6871 .094 .269 .0000 .3129

Estimation of Models with Autoregressive errors


Economic time series do not adjust instantaneously to changes in the economic environment. One example of a dynamic model is the regression model with first-order autoregressive errors (an AR(1) error model). The equation errors have the form:
t

t-1

+ vt

with -1 < < 1

where (RHO) is the autoregressive parameter and vt is another random error that is assumed to be zero mean, homoskedastic and serially uncorrelated. Test procedures for detecting the presence of AR(1) errors were discussed earlier in this guide. Users should be reminded that the appearance of autocorrelated errors may reflect misspecification in the structural part of the equation rather than a misspecified error structure. The SHAZAM AUTO command is available for the estimation of models with autoregressive errors. The general command format is:
AUTO depvar indeps / options

where depvar is the dependent variable, indeps is a list of the explanatory variables and options is a list of desired options. Cochrane-Orcutt iterative estimation By default, SHAZAM assumes an AR(1) error model and implements model estimation by the Cochrane-Orcutt method. An iterative estimation procedure is used. The starting point is to obtain parameter estimates by OLS. The OLS residuals are then used to obtain an estimate of from the regression: t = t-1 + vt for t = 2, ..., N

The estimate of is used to construct transformed observations (the first observation is given a special transformation) and parameter estimates are obtained by applying OLS to the transformed model. A
100

new estimate of is computed and another round of parameter estimates is obtained. The iterations stop when successive estimates of differ by less than 0.001. It is worthwhile noting that this is an example of a nonlinear least squares estimator that is derived by minimizing a sum of squared errors function. A specification of the objective function is given in Griffiths, Hill and Judge [1993, Equation (16.4.4), p. 529]. The first observation is included in the objective function by a special transformation. In general, the solution of nonlinear least squares problems requires the use of numerical optimisation algorithms. The Cochrane-Orcutt iterative method is an example of a solution algorithm. Other estimation approaches and more general error structures Alternative estimation algorithms for the AR(1) error model are available. A number of these are implemented in SHAZAM as options on the AUTO command. The interested user can consult the SHAZAM User's Reference Manual. The SHAZAM AUTO command can also estimate models with higher order autoregressive errors and models with moving average errors. Tests for autocorrelation after correcting for AR(1) errors After estimation with AR(1) errors it is useful to check if the vt errors are serially uncorrelated. The Durbin-Watson test is inappropriate because the transformed model incorporates a lagged dependent variable. A test that can be used is Durbin's h test. SHAZAM reports Durbin's h test when the RSTAT option is specified on the AUTO command. Example This example, from Gujarati, examines the relationship between the help-wanted index and the unemployment rate in the United States. The data set contains quarterly data from 1962 to 1967. Gujarati chooses a log-log model for the analysis. The SHAZAM command file (filename: HWI.SHA) that follows first estimates the model by OLS and tests for the possibility of AR(1) errors. CochraneOrcutt iterative estimation is then implemented.
SAMPLE 1 24 READ (HWI.txt) DATE HWI URATE * Transform to logarithms GENR LHWI=LOG(HWI) GENR LURATE=LOG(URATE) * OLS estimation - test for autocorrelated errors OLS LHWI LURATE / RSTAT DWPVALUE LOGLOG * * Cochrane-Orcutt iterative estimation AUTO LHWI LURATE / RSTAT LOGLOG STOP

The SHAZAM output can be viewed. The OLS estimation results report:
DURBIN-WATSON = .9108 VON NEUMANN RATIO = .9504 RHO = .54571

SHAZAM reports the p-value for the Durbin-Watson test statistic as .000672. This gives strong evidence for positive serial correlation in the residuals. The right of the output reports an estimate of the autoregressive parameter RHO as 0.54571. This value is less than 1 in absolute value and so is in
101

the acceptable region for stationarity. Therefore, this model is a candidate for estimation with AR(1) errors. The iterations in the Cochrane-Orcutt estimation procedure are shown below.
ITERATION 1 2 3 4 5 6 RHO .00000 .54571 .57223 .57836 .57999 .58044 LOG L.F. -82.8653 -78.8009 -78.8066 -78.8108 -78.8120 -78.8124 SSE .74302E-01 .52180E-01 .52111E-01 .52106E-01 .52106E-01 .52106E-01

The starting point at iteration 1 with RHO=0 is OLS. Iteration 2 uses the RHO estimate computed from the OLS residuals (as reported on the OLS estimation output). The difference in the RHO estimate from iteration 5 to 6 is 0.58044-0.57999=0.00045. This is less than 0.001 and so the iterations stop at iteration 6. The final column (SSE) reports the sum of squared errors (this refers to the v(t) error). This is the value of the nonlinear least squares objective function that is being minimized. Each iteration should show a smaller value for the SSE. It can be noted that iterations 5 and 6 show little improvement in the SSE. The final estimate for RHO is 0.58044. It is important to take careful note of this value. A result that is not infrequently encountered is a RHO value near 1. This indicates non-stationarity and other modelling approaches may need to be investigated. How successful was the model estimation procedure ? The RSTAT option on the AUTO command produces the following output after the display of the estimation results.
DURBIN-WATSON = 1.8594 VON NEUMANN RATIO = 1.9402 .31712 RHO = .03757

DURBIN H STATISTIC (ASYMPTOTIC NORMAL) = MODIFIED FOR AUTO ORDER=1

The RHO value reported at the extreme right is now the estimate of the autoregressive parameter for the vt errors. (The previous RHO was the estimate of the autoregressive parameter for the t errors.) Durbin's h statistic is computed to be 0.31712. This is less than the 5% critical value from the standard normal distribution of 1.96 and so the null hypothesis of no serial correlation is not rejected. However, it can be noted that only 24 observations were available and Durbin's h test may not be reliable in small samples.

[SHAZAM Guide home] SHAZAM output : Cochrane-Orcutt Iterative Estimation This example is from Gujarati [1995, Section 12.7, pp. 433+].

102

|_SAMPLE 1 24 |_READ (HWI.txt) DATE HWI URATE UNIT 88 IS NOW ASSIGNED TO: HWI.txt 3 VARIABLES AND 24 OBSERVATIONS STARTING AT OBS |_* Transform to logarithms |_GENR LHWI=LOG(HWI) |_GENR LURATE=LOG(URATE) |_* OLS estimation - test for autocorrelated errors |_OLS LHWI LURATE / RSTAT DWPVALUE LOGLOG OLS ESTIMATION 24 OBSERVATIONS DEPENDENT VARIABLE = LHWI ...NOTE..SAMPLE RANGE SET TO: 1, 24 DURBIN-WATSON STATISTIC = 0.91077 DURBIN-WATSON POSITIVE AUTOCORRELATION TEST P-VALUE = NEGATIVE AUTOCORRELATION TEST P-VALUE = R-SQUARE = .9550 R-SQUARE ADJUSTED = .9530 VARIANCE OF THE ESTIMATE-SIGMA**2 = .33773E-02 STANDARD ERROR OF THE ESTIMATE-SIGMA = .58115E-01 SUM OF SQUARED ERRORS-SSE= .74302E-01 MEAN OF DEPENDENT VARIABLE = 4.9226 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -82.8653 VARIABLE NAME LURATE CONSTANT ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR 22 DF -1.5375 .7114E-01 -21.61 7.3084 .1110 65.82 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 -.977 -.9772 -1.5375 .000 .997 .0000 7.3084 0.000672 0.999329 1

DURBIN-WATSON = .9108 VON NEUMANN RATIO = .9504 RHO = .54571 RESIDUAL SUM = .00000 RESIDUAL VARIANCE = .33773E-02 SUM OF ABSOLUTE ERRORS= 1.1162 R-SQUARE BETWEEN OBSERVED AND PREDICTED = .9550 R-SQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED = .9563 RUNS TEST: 4 RUNS, 13 POS, 0 ZERO, 11 NEG NORMAL STATISTIC = -3.7492 |_* |_* Cochrane-Orcutt iterative estimation |_AUTO LHWI LURATE / RSTAT LOGLOG DEPENDENT VARIABLE = LHWI ..NOTE..R-SQUARE,ANOVA,RESIDUALS DONE ON ORIGINAL VARS LEAST SQUARES ESTIMATION 24 OBSERVATIONS BY COCHRANE-ORCUTT TYPE PROCEDURE WITH CONVERGENCE = ITERATION 1 2 3 4 5 6 LOG L.F. = -78.8124 RHO .00000 .54571 .57223 .57836 .57999 .58044 LOG L.F. -82.8653 -78.8009 -78.8066 -78.8108 -78.8120 -78.8124 .58044 ASYMPTOTIC T-RATIO 3.49203 .00100 SSE .74302E-01 .52180E-01 .52111E-01 .52106E-01 .52106E-01 .52106E-01

AT RHO = ASYMPTOTIC ST.ERROR .16622

RHO

ESTIMATE .58044

ASYMPTOTIC VARIANCE .02763

103

R-SQUARE = .9685 R-SQUARE ADJUSTED = .9670 VARIANCE OF THE ESTIMATE-SIGMA**2 = .23684E-02 STANDARD ERROR OF THE ESTIMATE-SIGMA = .48667E-01 SUM OF SQUARED ERRORS-SSE= .52106E-01 MEAN OF DEPENDENT VARIABLE = 4.9226 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -78.8124 VARIABLE NAME LURATE CONSTANT ESTIMATED STANDARD COEFFICIENT ERROR -1.4712 .1251 7.2077 .1955 T-RATIO 22 DF -11.76 36.87 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS .000 -.929 -.9351 -1.4712 .000 .992 .0000 7.2077

DURBIN-WATSON = 1.8594 VON NEUMANN RATIO = 1.9402 RHO = .03757 RESIDUAL SUM = .84449E-02 RESIDUAL VARIANCE = .23717E-02 SUM OF ABSOLUTE ERRORS= .90632 R-SQUARE BETWEEN OBSERVED AND PREDICTED = .9684 R-SQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED = .9710 RUNS TEST: 12 RUNS, 11 POS, 0 ZERO, 13 NEG NORMAL STATISTIC = -.3854 DURBIN H STATISTIC (ASYMPTOTIC NORMAL) = .31712 MODIFIED FOR AUTO ORDER=1 |_STOP

Pooling Time-Series Cross-Section Data

Data sets may combine time series and cross section data. Two types of data sets are:
1. A data set with cross-sections such as states, provinces or countries. 2. A micro-panel or longitudinal data set constructed from a survey of the same micro-units over time. This type of data may typically contain a large number of cross-sectional units (for example, families or individuals) and relatively few time periods.

A pooled regression model assumes common coefficients across the cross-section units. The POOL command in SHAZAM provides features for estimating models that combine time series and cross section data. The general command format is:
POOL depvar indeps / NCROSS=n options

where depvar is the dependent variable, indeps is a list of the explanatory variables and options is a list of desired options. The NCROSS= option specifies the number of cross-section units. The data must be arranged so that all observations of a cross-section are together. That is, the complete time series for the first group must be followed by the complete time series for the second group etc. Examples The examples in this section use the investment demand data set that is analyzed in Greene [2000, Chapter 15].

Pooling by OLS with Panel-Corrected Standard Errors and Dummy Variables Estimation with AR(1) Errors Cross-Section Heteroskedasticity and Time-Wise Autoregression

Notes
104

Including lagged variables as explanatory variables Estimation with a sub-set of time series observations

References Nathaniel Beck and Jonathan N. Katz, "What to do (and not to do) with Time-Series Cross-Section Data", American Political Science Review, Vol. 89, 1995, pp. 634-47. Nathaniel Beck, Jonathan N. Katz, R. Michael Alvarez, Geoffrey Garrett and Peter Lange, "Government Partisanship, Labor Organization and Macroeconomic Performance: A Corrigendum", American Political Science Review, Vol. 87, 1993, pp. 945-948. A. Bhargava, L. Franzini and W. Narendranathan, "Serial Correlation and the Fixed Effects Model", Review of Economic Studies, Vol. 49, 1982, pp. 533-549. A. Buse, "Goodness of Fit in Generalized Least Squares Estimation", American Statistician, Vol. 27, 1973, pp. 106-108. William H. Greene, Econometric Analysis, Fourth Edition, 2000, Prentice-Hall. J. Kmenta, Elements of Econometrics, 1986, Macmillan. Richard W. Parks, "Efficient Estimation of a System of Regression Equations when Disturbances are both Serially and Contemporaneously Correlated", Journal of the American Statistical Association, Vol. 62, 1967, pp. 500-509.

[SHAZAM Guide home]

Including lagged variables as explanatory variables

When the POOL command is used, if lagged variables are included as explanatory variables then they should be specified using the special form:
var(first.last)

where var is a variable name and the numbers in parentheses specify the first and last periods to use for lags. For example, the next command implements pooled estimation with a lagged dependent variable.
POOL Y Y(1.1) X / NCROSS=4

[Back to Top]

[SHAZAM Guide home]

105

Estimation with a sub-set of time series observations

When the POOL command is used, a sub-set of time series observations can be selected. This is demonstrated in the next list of SHAZAM commands.
SAMPLE 1 100 * Set the number of time periods GEN1 NT=20 * Generate an index for each cross-section GENR CSINDEX=SUM(SEAS(NT)) * Generate a time index for each cross-section GENR TINDEX=TIME(0)-NT*(CSINDEX-1) * Estimate over the time period 1 to 15 SET NOWARNSKIP SKIPIF (TINDEX.GT.15) POOL I F C / NCROSS=5 DELETE SKIP$ STOP

Sets of Linear Equations


An economic model may contain a number of linear equations. It may be realistic to expect that the equation errors will be correlated. A set of equations that has contemporaneous cross-equation error correlation is called a seemingly unrelated regression (SUR) system. At first look the equations seem unrelated. But the equations are related through the correlation in the errors. A set of seemingly unrelated regression equations can be estimated with the general command format:
SYSTEM neq / options OLS depvar indeps . . . OLS depvar indeps

where neq is the number of equations and options is a list of desired options. After the SYSTEM command there must be one OLS command for each equation in the system. Options must not be specified on the OLS commands. Options are specified on the SYSTEM command. Linear parameter restrictions can be imposed with the general command format:
SYSTEM neq / RESTRICT options OLS depvar indeps . . . OLS depvar indeps RESTRICT equation . . . RESTRICT equation END

The RESTRICT commands are specified as linear functions of the variables in the system. The variable names represent the coefficients. Example
106

The example in this section considers some more variations on the analysis of the investment demand data set that was explored in the previous section on pooling cross-section time-series data. The SHAZAM commands (filename: SURE.SHA) below estimate a system of investment demand equations. Following model estimation hypothesis testing is considered to test for the validity of crossequation parameter restrictions.
SAMPLE 1 20 READ(FIRM1.txt) YEAR IGM FGM CGM ICHR FCHR CCHR / SKIPLINES=1 READ(FIRM2.txt) YEAR IGE FGE CGE IWH FWH CWH / SKIPLINES=1 READ(FIRM3.txt) YEAR IUS FUS CUS / SKIPLINES=1 * Unrestricted SUR Estimation SYSTEM 5 / DN OLS IGM FGM CGM OLS ICHR FCHR CCHR OLS IGE FGE CGE OLS IWH FWH CWH OLS IUS FUS CUS * Hypothesis Testing TEST TEST FGM=FCHR TEST FGM=FGE TEST FGM=FWH TEST FGM=FUS END TEST TEST CGM=CCHR TEST CGM=CGE TEST CGM=CWH TEST CGM=CUS END TEST FGM=FCHR TEST FGM=FGE TEST FGM=FWH TEST FGM=FUS TEST FCHR=FGE TEST FCHR=FWH TEST FCHR=FUS TEST FGE=FWH TEST FGE=FUS TEST FWH=FUS TEST CGM=CCHR TEST CGM=CGE TEST CGM=CWH TEST CGM=CUS TEST CCHR=CGE TEST CCHR=CWH TEST CCHR=CUS TEST CGE=CWH TEST CGE=CUS TEST CWH=CUS * Test for equality of slope parameters for * General Electric (GE) and Westinghouse (WH) TEST TEST FGE=FWH TEST CGE=CWH END * Test for equality of slope parameters for

107

* General Motors (GM), Chrysler (CHR) and U.S. Steel (US) TEST TEST FGM=FCHR TEST FGM=FUS TEST CGM=CCHR TEST CGM=CUS END STOP

The SHAZAM output can be viewed. Parameter estimates (with standard errors in parentheses) from SUR estimation as reported in Greene [2000, Table 15.6, p. 619] are: GM intercept F C CH GE WE US

-162.364 0.504 -22.439 1.089 85.423 (89.459) (11.513) (25.519) ( 6.259) (111.877) 0.120 0.070 ( 0.022) ( 0.017) 0.383 0.309 ( 0.033) ( 0.026) 0.037 0.057 ( 0.012) ( 0.011) 0.131 0.042 ( 0.022) ( 0.041) 0.101 ( 0.055) 0.400 ( 0.128)

Note: GM = General Motors, CH = Chrysler, GE = General Electric, WE = Westinghouse and US = U.S. Steel.

[SHAZAM Guide home] SHAZAM output


|_SAMPLE 1 20 |_READ(FIRM1.txt) YEAR IGM FGM CGM ICHR FCHR CCHR / SKIPLINES=1 UNIT 88 IS NOW ASSIGNED TO: FIRM1.txt 7 VARIABLES AND 20 OBSERVATIONS STARTING AT OBS 1 |_READ(FIRM2.txt) YEAR IGE FGE CGE IWH FWH CWH / SKIPLINES=1 UNIT 88 IS NOW ASSIGNED TO: FIRM2.txt 7 VARIABLES AND 20 OBSERVATIONS STARTING AT OBS |_READ(FIRM3.txt) YEAR IUS FUS CUS / SKIPLINES=1 UNIT 88 IS NOW ASSIGNED TO: FIRM3.txt 4 VARIABLES AND 20 OBSERVATIONS STARTING AT OBS |_* Unrestricted SUR Estimation |_SYSTEM 5 / DN |_OLS IGM FGM CGM |_OLS ICHR FCHR CCHR |_OLS IGE FGE CGE |_OLS IWH FWH CWH |_OLS IUS FUS CUS

108

MULTIVARIATE REGRESSION-5 EQUATIONS 10 RIGHT-HAND SIDE VARIABLES IN SYSTEM MAX ITERATIONS = 1 CONVERGENCE TOLERANCE = 20 OBSERVATIONS DN OPTION IN EFFECT - DIVISOR IS N ITERATION 0 COEFFICIENTS 0.11928 0.37144 0.77948E-01 0.52894E-01 0.92406E-01 0.15657 ITERATION 7160.3 -282.76 607.53 126.18 -2222.1 0 SIGMA 149.87 -21.376 13.307 418.08 660.83 176.45 904.95 0.31572 0.42387

0.10000E -02

0.26551E-01

0.15169

88.662 546.19

8896.4

BREUSCH-PAGAN LM TEST FOR DIAGONAL COVARIANCE MATRIX CHI-SQUARE = 29.060 WITH 10 DEGREES OF FREEDOM LOG OF DETERMINANT OF SIGMA= LOG OF LIKELIHOOD FUNCTION = 32.163 -463.522

ITERATION 1 SIGMA INVERSE 0.19714E-03 0.14141E-03 0.82319E-02 -0.15017E-03 0.87747E-03 0.35267E-02 -0.57804E-03 -0.75210E-03 -0.68215E-02 0.35682E-01 0.93359E-04 -0.39461E-03 -0.18686E-04 -0.16058E-02 ITERATION 1 COEFFICIENTS 0.12049 0.38275 0.69546E-01 0.57009E-01 0.41506E-01 0.10148 ITERATION 7216.0 -313.70 605.34 129.89 -2686.5 1 SIGMA 152.85 2.0474 16.661 455.09 700.46 200.32 1224.4 31.755 -459.440 0.30854 0.39999

0.25476E-03 0.37291E-01 0.13078

94.912 652.72

9188.2

LOG OF DETERMINANT OF SIGMA= LOG OF LIKELIHOOD FUNCTION =

SYSTEM R-SQUARE = 0.9805 ... CHI-SQUARE = 78.738 LIKELIHOOD RATIO TEST OF DIAGONAL COVARIANCE MATRIX = VARIABLE FGM CGM FCHR CCHR FGE CGE FWH CWH FUS CUS COEFFICIENT 0.12049 0.38275 0.69546E-01 0.30854 0.37291E-01 0.13078 0.57009E-01 0.41506E-01 0.10148 0.39999 ST.ERROR 0.21629E-01 0.32768E-01 0.16898E-01 0.25864E-01 0.12263E-01 0.22050E-01 0.11362E-01 0.41202E-01 0.54784E-01 0.12779 T-RATIO 5.5709 11.680 4.1157 11.930 3.0409 5.9313 5.0174 1.0074 1.8523 3.1300

WITH 10 D.F. 44.065 WITH

10 D.F.

EQUATION 1 OF 5 EQUATIONS DEPENDENT VARIABLE = IGM R-SQUARE = 0.9207 VARIANCE OF THE ESTIMATE-SIGMA**2 =

20 OBSERVATIONS 7216.0

109

STANDARD ERROR OF THE ESTIMATE-SIGMA = 84.947 SUM OF SQUARED ERRORS-SSE= 0.14432E+06 MEAN OF DEPENDENT VARIABLE = 608.02 LOG OF THE LIKELIHOOD FUNCTION = -459.440 VARIABLE NAME FGM CGM CONSTANT ASYMPTOTIC ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR -------0.12049 0.2163E-01 5.571 0.38275 0.3277E-01 11.68 -162.36 89.47 -1.815 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS 0.000 0.804 0.3520 0.8589 0.000 0.943 0.7791 0.4082 0.070-0.403 0.0000 -0.2670 20 OBSERVATIONS

EQUATION 2 OF 5 EQUATIONS DEPENDENT VARIABLE = ICHR

R-SQUARE = 0.9119 VARIANCE OF THE ESTIMATE-SIGMA**2 = 152.85 STANDARD ERROR OF THE ESTIMATE-SIGMA = 12.363 SUM OF SQUARED ERRORS-SSE= 3057.0 MEAN OF DEPENDENT VARIABLE = 86.124 LOG OF THE LIKELIHOOD FUNCTION = -459.440 VARIABLE NAME FCHR CCHR CONSTANT ASYMPTOTIC ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY COEFFICIENT ERROR -------P-VALUE CORR. COEFFICIENT AT MEANS 0.69546E-01 0.1690E-01 4.116 0.000 0.706 0.2614 0.5598 0.30854 0.2586E-01 11.93 0.000 0.945 0.8040 0.4344 0.50430 11.52 0.4378E-01 0.965 0.011 0.0000 0.0059 20 OBSERVATIONS

EQUATION 3 OF 5 EQUATIONS DEPENDENT VARIABLE = IGE

R-SQUARE = 0.6876 VARIANCE OF THE ESTIMATE-SIGMA**2 = 700.46 STANDARD ERROR OF THE ESTIMATE-SIGMA = 26.466 SUM OF SQUARED ERRORS-SSE= 14009. MEAN OF DEPENDENT VARIABLE = 102.29 LOG OF THE LIKELIHOOD FUNCTION = -459.440 VARIABLE NAME FGE CGE CONSTANT ASYMPTOTIC ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR -------0.37291E-01 0.1226E-01 3.041 0.13078 0.2205E-01 5.931 -22.439 25.56 -0.8780 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS 0.002 0.594 0.3176 0.7077 0.000 0.821 0.6746 0.5116 0.380-0.208 0.0000 -0.2194 20 OBSERVATIONS

EQUATION 4 OF 5 EQUATIONS DEPENDENT VARIABLE = IWH

R-SQUARE = 0.7264 VARIANCE OF THE ESTIMATE-SIGMA**2 = 94.912 STANDARD ERROR OF THE ESTIMATE-SIGMA = 9.7423 SUM OF SQUARED ERRORS-SSE= 1898.2 MEAN OF DEPENDENT VARIABLE = 42.892 LOG OF THE LIKELIHOOD FUNCTION = -459.440 VARIABLE NAME FWH CWH CONSTANT EQUATION ASYMPTOTIC ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR -------0.57009E-01 0.1136E-01 5.017 0.41506E-01 0.4120E-01 1.007 1.0889 6.284 0.1733 5 OF 5 EQUATIONS PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS 0.000 0.773 0.6634 0.8917 0.314 0.237 0.1352 0.0829 0.862 0.042 0.0000 0.0254

110

DEPENDENT VARIABLE = IUS

20 OBSERVATIONS

R-SQUARE = 0.4220 VARIANCE OF THE ESTIMATE-SIGMA**2 = 9188.2 STANDARD ERROR OF THE ESTIMATE-SIGMA = 95.855 SUM OF SQUARED ERRORS-SSE= 0.18376E+06 MEAN OF DEPENDENT VARIABLE = 405.46 LOG OF THE LIKELIHOOD FUNCTION = -459.440 VARIABLE NAME FUS CUS CONSTANT ASYMPTOTIC ESTIMATED STANDARD T-RATIO COEFFICIENT ERROR -------0.10148 0.5478E-01 1.852 0.39999 0.1278 3.130 85.423 111.9 0.7631 PARTIAL STANDARDIZED ELASTICITY P-VALUE CORR. COEFFICIENT AT MEANS 0.064 0.410 0.2362 0.4935 0.002 0.605 0.4732 0.2958 0.445 0.182 0.0000 0.2107

|_* Hypothesis Testing |_TEST |_ TEST FGM=FCHR |_ TEST FGM=FGE |_ TEST FGM=FWH |_ TEST FGM=FUS |_END WALD CHI-SQUARE STATISTIC = 18.886206 WITH 4 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.21179 |_TEST |_ TEST CGM=CCHR |_ TEST CGM=CGE |_ TEST CGM=CWH |_ TEST CGM=CUS |_END WALD CHI-SQUARE STATISTIC = 106.54776 WITH 4 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.03754 |_TEST FGM=FCHR TEST VALUE = 0.50947E-01 STD. ERROR OF TEST VALUE 0.29181E-01 ASYMPTOTIC NORMAL STATISTIC = 1.7459342 P-VALUE= 0.08082 WALD CHI-SQUARE STATISTIC = 3.0482863 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.32805 |_TEST FGM=FGE TEST VALUE = 0.83202E-01 STD. ERROR OF TEST VALUE 0.22161E-01 ASYMPTOTIC NORMAL STATISTIC = 3.7543680 P-VALUE= 0.00017 WALD CHI-SQUARE STATISTIC = 14.095279 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.07095 |_TEST FGM=FWH TEST VALUE = 0.63484E-01 STD. ERROR OF TEST VALUE 0.22245E-01 ASYMPTOTIC NORMAL STATISTIC = 2.8537966 P-VALUE= 0.00432 WALD CHI-SQUARE STATISTIC = 8.1441551 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.12279 |_TEST FGM=FUS TEST VALUE = 0.19015E-01 STD. ERROR OF TEST VALUE 0.62293E-01 ASYMPTOTIC NORMAL STATISTIC = 0.30524615 P-VALUE= 0.76018 WALD CHI-SQUARE STATISTIC = 0.93175210E-01 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 1.00000 |_TEST FCHR=FGE TEST VALUE = 0.32254E-01 STD. ERROR OF TEST VALUE 0.21753E-01 ASYMPTOTIC NORMAL STATISTIC = 1.4827680 P-VALUE= 0.13814 WALD CHI-SQUARE STATISTIC = 2.1986011 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.45483 |_TEST FCHR=FWH TEST VALUE = 0.12536E-01 STD. ERROR OF TEST VALUE 0.20543E-01 ASYMPTOTIC NORMAL STATISTIC = 0.61024150 P-VALUE= 0.54170 WALD CHI-SQUARE STATISTIC = 0.37239468 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 1.00000 |_TEST FCHR=FUS

0.00083

0.00000

0.08082

0.00017

0.00432

0.76018

0.13814

0.54170

111

TEST VALUE = -0.31933E-01 STD. ERROR OF TEST VALUE 0.55319E-01 ASYMPTOTIC NORMAL STATISTIC = -0.57724010 P-VALUE= 0.56378 WALD CHI-SQUARE STATISTIC = 0.33320613 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 1.00000 |_TEST FGE=FWH TEST VALUE = -0.19718E-01 STD. ERROR OF TEST VALUE 0.10235E-01 ASYMPTOTIC NORMAL STATISTIC = -1.9265070 P-VALUE= 0.05404 WALD CHI-SQUARE STATISTIC = 3.7114292 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.26944 |_TEST FGE=FUS TEST VALUE = -0.64187E-01 STD. ERROR OF TEST VALUE 0.53675E-01 ASYMPTOTIC NORMAL STATISTIC = -1.1958498 P-VALUE= 0.23176 WALD CHI-SQUARE STATISTIC = 1.4300568 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.69927 |_TEST FWH=FUS TEST VALUE = -0.44469E-01 STD. ERROR OF TEST VALUE 0.51806E-01 ASYMPTOTIC NORMAL STATISTIC = -0.85837692 P-VALUE= 0.39068 WALD CHI-SQUARE STATISTIC = 0.73681094 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 1.00000 |_TEST CGM=CCHR TEST VALUE = 0.74202E-01 STD. ERROR OF TEST VALUE 0.46049E-01 ASYMPTOTIC NORMAL STATISTIC = 1.6113636 P-VALUE= 0.10710 WALD CHI-SQUARE STATISTIC = 2.5964927 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.38513 |_TEST CGM=CGE TEST VALUE = 0.25196 STD. ERROR OF TEST VALUE 0.34111E -01 ASYMPTOTIC NORMAL STATISTIC = 7.3865591 P-VALUE= 0.00000 WALD CHI-SQUARE STATISTIC = 54.561255 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.01833 |_TEST CGM=CWH TEST VALUE = 0.34124 STD. ERROR OF TEST VALUE 0.47802E -01 ASYMPTOTIC NORMAL STATISTIC = 7.1386421 P-VALUE= 0.00000 WALD CHI-SQUARE STATISTIC = 50.960211 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.01962 |_TEST CGM=CUS TEST VALUE = -0.17245E-01 STD. ERROR OF TEST VALUE 0.13956 ASYMPTOTIC NORMAL STATISTIC = -0.12356820 P-VALUE= 0.90166 WALD CHI-SQUARE STATISTIC = 0.15269100E-01 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 1.00000 |_TEST CCHR=CGE TEST VALUE = 0.17776 STD. ERROR OF TEST VALUE 0.35586E -01 ASYMPTOTIC NORMAL STATISTIC = 4.9953052 P-VALUE= 0.00000 WALD CHI-SQUARE STATISTIC = 24.953074 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.04008 |_TEST CCHR=CWH TEST VALUE = 0.26704 STD. ERROR OF TEST VALUE 0.48302E-01 ASYMPTOTIC NORMAL STATISTIC = 5.5284573 P-VALUE= 0.00000 WALD CHI-SQUARE STATISTIC = 30.563840 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.03272 |_TEST CCHR=CUS TEST VALUE = -0.91447E-01 STD. ERROR OF TEST VALUE 0.12223 ASYMPTOTIC NORMAL STATISTIC = -0.74817908 P-VALUE= 0.45435 WALD CHI-SQUARE STATISTIC = 0.55977194 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 1.00000 |_TEST CGE=CWH TEST VALUE = 0.89277E-01 STD. ERROR OF TEST VALUE 0.35511E-01 ASYMPTOTIC NORMAL STATISTIC = 2.5140381 P-VALUE= 0.01194 WALD CHI-SQUARE STATISTIC = 6.3203873 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.15822 |_TEST CGE=CUS TEST VALUE = -0.26921 STD. ERROR OF TEST VALUE 0.12354 ASYMPTOTIC NORMAL STATISTIC = -2.1790789 P-VALUE= 0.02933 WALD CHI-SQUARE STATISTIC = 4.7483850 WITH 1 D.F. P-VALUE= UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.21060

0.56378

0.05404

0.23176

0.39068

0.10710

0.00000

0.00000

0.90166

0.00000

0.00000

0.45435

0.01194

0.02933

112

|_TEST CWH=CUS TEST VALUE = -0.35848 STD. ERROR OF TEST VALUE 0.12323 ASYMPTOTIC NORMAL STATISTIC = -2.9091345 P-VALUE= 0.00362 WALD CHI-SQUARE STATISTIC = 8.4630637 WITH 1 D.F. P-VALUE= 0.00362 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.11816 |_* Test for equality of slope parameters for |_* General Electric (GE) and Westinghouse (WH) |_TEST |_ TEST FGE=FWH |_ TEST CGE=CWH |_END WALD CHI-SQUARE STATISTIC = 7.7527986 WITH 2 D.F. P-VALUE= 0.02073 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.25797 |_* Test for equality of slope parameters for |_* General Motors (GM), Chrysler (CHR) and U.S. Steel (US) |_TEST |_ TEST FGM=FCHR |_ TEST FGM=FUS |_ TEST CGM=CCHR |_ TEST CGM=CUS |_END WALD CHI-SQUARE STATISTIC = 10.641944 WITH 4 D.F. P-VALUE= 0.03090 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = 0.37587 |_STOP Maximum Likelihood Estimation Logit and Probit Analysis ARCH and GARCH Models

Logit and Probit Analysis


When the dependent variable is a 0-1 binary variable the logit or probit model estimation methods can be used. In SHAZAM, these methods are implemented with the LOGIT and PROBIT commands. The logit model is discussed and illustrated here. The probit model can be implemented in a similar style. For the LOGIT command, the general command format is:
LOGIT depvar indeps / options

where depvar is a 0-1 binary dependent variable, indeps is a list of the explanatory variables and options is a list of desired options. The list of options is described in the SHAZAM User's Reference Manual. The logit model assumes that the response probability has the form:

An equivalent form can be stated by noting that:

113

The function guarantees probabilities in the (0,1) range. The logit form also gives a plausible shape for the marginal effects. That is, for a continuous variable Xk, at relatively high values, a marginal change will give a relatively smaller change in the probability of a success (Y=1). The estimation problem is to find estimates of the unknown parameters . Example A data set on voting decisions for a school budget is available. The question of interest is: what factors influence the probability of a yes vote ? This question can be answered by interpreting the estimation results from a logit model. SHAZAM commands are given below.
SAMPLE 1 95 READ (school.txt) PUB12 PUB34 PUB5 PRIV YEARS SCHOOL & LOGINC PTCON YESVM * The income and tax variables are in logarithms -- take anti-logs * to express the variables in thousands of $. * Income GENR INCOME=EXP(LOGINC)/1000 * Property taxes GENR TAX=EXP(PTCON)/1000 * LOGIT estimation. LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL INCOME TAX * Now use the log transformed form of income and taxes. LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON * Use the LOG option to compute elasticities and marginal effects * assuming log-transformed variables. LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / LOG COEF=BETA STOP

The first model estimation includes the income and property tax variables in levels. The second model estimation includes log transformations of the income and property tax variables. Rubinfeld (1977, p. 35) comments: "The inclusion of logarithmic income and price terms resulted in a better fit than the inclusion of linear forms of the variables". The SHAZAM output can be viewed. The results are discussed in the following sections:

Model Estimation by the Method of Maximum Likelihood Interpretation of the Results Overall Significance and Goodness of Fit Measures Predicting Probabilities Testing for Heteroskedasticity

References Good textbook discussion is: William Greene, Econometric Analysis. Jeffrey M. Wooldridge, Introductory Econometrics: A Modern Approach . References with more technical details are:
114

R. Davidson and J.G. MacKinnon, "Convenient Specification Tests for Logit and Probit Models", Journal of Econometrics, Vol 25, 1984, pp. 241-262. D. A. Hensher and L. W. Johnson, Applied Discrete-Choice Modelling, John Wiley & Sons, 1981. G. S. Maddala, Limited-dependent and Qualitative Variables in Econometrics, Cambridge University Press, 1983. Kenneth Train, Qualitative Choice Analysis: Theory, Econometrics and an Application to Automobile Demand, MIT Press, 1986.

[SHAZAM Guide home] SHAZAM output


|_SAMPLE 1 95 |_READ (school.txt) PUB12 PUB34 PUB5 PRIV YEARS SCHOOL & | LOGINC PTCON YESVM UNIT 88 IS NOW ASSIGNED TO: school.txt 9 VARIABLES AND 95 OBSERVATIONS STARTING AT OBS

|_* The income and tax variables are in logarithms -- take anti-logs |_* to express the variables in thousands of $. |_* Income |_GENR INCOME=EXP(LOGINC)/1000 |_* Property taxes |_GENR TAX=EXP(PTCON)/1000 |_* LOGIT estimation. |_LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL INCOME TAX LOGIT ANALYSIS DEPENDENT VARIABLE =YESVM 95. TOTAL OBSERVATIONS 59. OBSERVATIONS AT ONE 36. OBSERVATIONS AT ZERO 25 MAXIMUM ITERATIONS CONVERGENCE TOLERANCE =0.00100 CHOICES = 2

LOG OF LIKELIHOOD WITH CONSTANT TERM ONLY = -63.037 BINOMIAL ESTIMATE = 0.6211 ITERATION 0 LOG OF LIKELIHOOD FUNCTION = -63.037 ITERATION 1 ESTIMATES 0.54133 0.97999 0.39823 -0.23810 0.49110E-01 -1.6498 0.68486 ITERATION 1 LOG OF LIKELIHOOD FUNCTION = ITERATION 2 ESTIMATES 0.61000 1.1179 0.44480 -0.30742 0.63240E-01 -2.0213 0.75025 ITERATION 2 LOG OF LIKELIHOOD FUNCTION = ITERATION 3 ESTIMATES 0.62370 1.1363 0.44904 -0.31404 0.65039E-01 -2.0686 0.75393 ITERATION 3 LOG OF LIKELIHOOD FUNCTION = ITERATION 4 ESTIMATES -0.28618E-01 -55.958 -0.31099E-01 -55.560 -0.31469E-01 -55.548 1.8634 1.7144 1.1845

115

0.62413 1.1368 0.65077E-01 -2.0696 VARIABLE NAME PUB12 PUB34 PUB5 PRIV YEARS SCHOOL INCOME TAX CONSTANT ESTIMATED COEFFICIENT 0.62413 1.1368 0.44921 -0.31413 -0.31480E-01 1.8724 0.65077E-01 -2.0696 0.75389 0.22761 MARGINAL EFFECT 0.14206 0.25874 0.10224 -0.71499E-01 -0.71652E-02 0.42617 0.14812E-01 -0.47105

0.44921 0.75389

-0.31413

-0.31480E-01

1.8724

ASYMPTOTIC STANDARD T-RATIO ERROR 0.66847 0.93366 0.74861 1.5185 1.2500 0.35937 0.77985 -0.40281 0.26096E-01 -1.2063 1.1255 1.6636 0.35634E-01 1.8263 1.0383 -1.9932 1.1352 0.66411

WEIGHTED ELASTICITY AGGREGATE AT MEANS ELASTICITY 0.10588 0.10248 0.12577 0.10148 0.66268E -02 0.61577E-02 -0.11585E-01 -0.11295E-01 -0.93925E-01 -0.88468E-01 0.75959E -01 0.27663E-01 0.52655 0.48027 -0.78308 -0.73375 0.26413 0.24491

SCALE FACTOR = VARIABLE NAME PUB12 PUB34 PUB5 PRIV YEARS SCHOOL INCOME TAX

----- PROBABILITIES FOR A TYPICAL CASE ----CASE X=0 X=1 MARGINAL VALUES EFFECT 0.0000 0.43871 0.59333 0.15462 0.0000 0.43871 0.70897 0.27026 0.0000 0.43871 0.55053 0.11182 0.0000 0.43871 0.36342 -0.75286E-01 8.5158 0.0000 0.43871 0.83562 0.39691 23.094 1.0800

LOG-LIKELIHOOD FUNCTION = -55.548 LOG-LIKELIHOOD(0) = -63.037 LIKELIHOOD RATIO TEST = 14.9788

WITH

D.F.

P -VALUE= 0.05956

ESTRELLA R-SQUARE 0.15452 MADDALA R-SQUARE 0.14587 CRAGG-UHLER R-SQUARE 0.19853 MCFADDEN R-SQUARE 0.11881 ADJUSTED FOR DEGREES OF FREEDOM 0.36838E-01 APPROXIMATELY F-DISTRIBUTED 0.15168 WITH CHOW R-SQUARE 0.13244 PREDICTION SUCCESS TABLE ACTUAL 0 1 0 14. 6. PREDICTED 1 22. 53. NUMBER OF RIGHT PREDICTIONS = 67.0 PERCENTAGE OF RIGHT PREDICTIONS = 0.70526 NAIVE MODEL PERCENTAGE OF RIGHT PREDICTIONS =

AND

D.F.

0.62105 36.0 59.0

EXPECTED OBSERVATIONS AT 0 = 36.0 OBSERVED = EXPECTED OBSERVATIONS AT 1 = 59.0 OBSERVED = SUM OF SQUARED "RESIDUALS" = 19.397 WEIGHTED SUM OF SQUARED "RESIDUALS" = 89.109 HENSHER-JOHNSON PREDICTION SUCCESS TABLE ACTUAL 0 1 PREDICTED 0 16.718 19.282 CHOICE 1 19.282 39.718 OBSERVED COUNT 36.000 59.000

OBSERVED SHARE 0.379 0.621

116

PREDICTED COUNT 36.000 PREDICTED SHARE 0.379 PROP. SUCCESSFUL 0.464 SUCCESS INDEX 0.085 PROPORTIONAL ERROR 0.000 NORMALIZED SUCCESS INDEX

59.000 0.621 0.673 0.052 0.000

95.000 1.000 0.594 0.065 0.138

1.000

|_* Now use the log transformed form of income and taxes. |_LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON LOGIT ANALYSIS DEPENDENT VARIABLE =YESVM 95. TOTAL OBSERVATIONS 59. OBSERVATIONS AT ONE 36. OBSERVATIONS AT ZERO 25 MAXIMUM ITERATIONS CONVERGENCE TOLERANCE =0.00100 CHOICES = 2

LOG OF LIKELIHOOD WITH CONSTANT TERM ONLY = -63.037 BINOMIAL ESTIMATE = 0.6211 ITERATION 0 LOG OF LIKELIHOOD FUNCTION = -63.037 ITERATION 0.45375 1.6059 ITERATION ITERATION 0.55298 2.0427 ITERATION ITERATION 0.58166 2.1706 ITERATION ITERATION 0.58362 2.1869 ITERATION ITERATION 0.58364 2.1872 VARIABLE NAME PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON CONSTANT 1 ESTIMATES 0.92076 0.43035 -0.28835 -1.7546 -3.7958 1 LOG OF LIKELIHOOD FUNCTION = 2 ESTIMATES 1.0944 0.50979 -0.32984 -2.2551 -4.7103 2 LOG OF LIKELIHOOD FUNCTION = 3 ESTIMATES 1.1250 0.52500 -0.33987 -2.3799 -5.1361 3 LOG OF LIKELIHOOD FUNCTION = 4 ESTIMATES 1.1261 0.52605 -0.34139 -2.3942 -5.2003 4 LOG OF LIKELIHOOD FUNCTION = 5 ESTIMATES 1.1261 -2.3945 0.52606 -5.2014 -0.34142 -0.23416E-01 -54.139 -0.25855E-01 -53.370 -0.26178E-01 -53.304 -0.26129E-01 -53.303 -0.26127E-01 2.6250 2.6239 2.5635 2.1655 1.3330

ESTIMATED COEFFICIENT 0.58364 1.1261 0.52606 -0.34142 -0.26127E-01 2.6250 2.1872 -2.3945 -5.2014 0.22197 MARGINAL EFFECT 0.12955

ASYMPTOTIC STANDARD T-RATIO ERROR 0.68778 0.84858 0.76820 1.4659 1.2693 0.41445 0.78299 -0.43605 0.26934E-01 -0.97006 1.4101 1.8616 0.78781 2.7763 1.0813 -2.2145 7.5503 -0.68890

WEIGHTED ELASTICITY AGGREGATE AT MEANS ELASTIC ITY 0.93986E -01 0.91051E-01 0.11827 0.96460E -01 0.73664E -02 0.69375E-02 -0.11952E-01 -0.12037E-01 -0.73996E-01 -0.68592E-01 0.10108 0.28999E -01 7.2529 6.7561 -5.5262 -5.1745 -1.7298 -1.6137

SCALE FACTOR = VARIABLE NAME PUB12

----- PROBABILITIES FOR A TYPICAL CASE ----CASE X=0 X=1 MARGINAL VALUES EFFECT 0.0000 0.44231 0.58706 0.14476

117

PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON

0.24996 0.11677 -0.75785E-01 -0.57995E-02 0.58267 0.48548 -0.53150

0.0000 0.0000 0.0000 8.5158 0.0000 9.9711 6.9395

0.44231 0.44231 0.44231 0.44231

0.70978 0.57304 0.36049 0.91631

0.26747 0.13073 -0.81814E-01 0.47400

LOG-LIKELIHOOD FUNCTION = -53.303 LOG-LIKELIHOOD(0) = -63.037 LIKELIHOOD RATIO TEST = 19.4681

WITH

D.F.

P-VALUE= 0.01255

ESTRELLA R-SQUARE 0.19956 MADDALA R-SQUARE 0.18529 CRAGG-UHLER R-SQUARE 0.25218 MCFADDEN R-SQUARE 0.15442 ADJUSTED FOR DEGREES OF FREEDOM 0.75759E-01 APPROXIMATELY F-DISTRIBUTED 0.20544 WITH CHOW R-SQUARE 0.17197 PREDICTION SUCCESS TABLE ACTUAL 0 1 0 18. 7. PREDICTED 1 18. 52. NUMBER OF RIGHT PREDICTIONS = 70.0 PERCENTAGE OF RIGHT PREDICTIONS = 0.73684 NAIVE MODEL PERCENTAGE OF RIGHT PREDICTIONS =

AND

D.F.

0.62105 36.0 59.0

EXPECTED OBSERVATIONS AT 0 = 36.0 OBSERVED = EXPECTED OBSERVATIONS AT 1 = 59.0 OBSERVED = SUM OF SQUARED "RESIDUALS" = 18.513 WEIGHTED SUM OF SQUARED "RESIDUALS" = 86.839 HENSHER-JOHNSON PREDICTION SUCCESS TABLE ACTUAL 0 1 PREDICTED 0 17.591 18.409 CHOICE 1 18.409 40.591 59.000 0.621 0.688 0.067 0.000 OBSERVED COUNT 36.000 59.000 95.000 1.000 0.612 0.083 0.177

OBSERVED SHARE 0.379 0.621 1.000

PREDICTED COUNT 36.000 PREDICTED SHARE 0.379 PROP. SUCCESSFUL 0.489 SUCCESS INDEX 0.110 PROPORTIONAL ERROR 0.000 NORMALIZED SUCCESS INDEX

|_* Use the LOG option to compute elasticities and marginal effects |_* assuming log-transformed variables. |_LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / LOG LOGIT ANALYSIS DEPENDENT VARIABLE =YESVM 95. TOTAL OBSERVATIONS 59. OBSERVATIONS AT ONE 36. OBSERVATIONS AT ZERO 25 MAXIMUM ITERATIONS CONVERGENCE TOLERANCE =0.00100 CHOICES = 2

LOG OF LIKELIHOOD WITH CONSTANT TERM ONLY = -63.037 BINOMIAL ESTIMATE = 0.6211 ITERATION 0 LOG OF LIKELIHOOD FUNCTION = -63.037

118

ITERATION 0.45375 1.6059 ITERATION ITERATION 0.55298 2.0427 ITERATION ITERATION 0.58166 2.1706 ITERATION ITERATION 0.58362 2.1869 ITERATION ITERATION 0.58364 2.1872

1 ESTIMATES 0.92076 0.43035 -0.28835 -1.7546 -3.7958 1 LOG OF LIKELIHOOD FUNCTION = 2 ESTIMATES 1.0944 0.50979 -0.32984 -2.2551 -4.7103 2 LOG OF LIKELIHOOD FUNCTION = 3 ESTIMATES 1.1250 0.52500 -0.33987 -2.3799 -5.1361 3 LOG OF LIKELIHOOD FUNCTION = 4 ESTIMATES 1.1261 0.52605 -0.34139 -2.3942 -5.2003 4 LOG OF LIKELIHOOD FUNCTION = 5 ESTIMATES 1.1261 -2.3945 0.52606 -5.2014 -0.34142

-0.23416E-01 -54.139 -0.25855E-01 -53.370 -0.26178E-01 -53.304 -0.26129E-01 -53.303 -0.26127E-01

1.3330

2.1655

2.5635

2.6239

2.6250

ELASTICITIES ASSUME LOG-TRANSFORMED VARIABLES VARIABLE NAME PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON CONSTANT ESTIMATED COEFFICIENT 0.58364 1.1261 0.52606 -0.34142 -0.26127E-01 2.6250 2.1872 -2.3945 -5.2014 0.22197 ASYMPTOTIC STANDARD T-RATIO ERROR 0.68778 0.84858 0.76820 1.4659 1.2693 0.41445 0.78299 -0.43605 0.26934E-01 -0.97006 1.4101 1.8616 0.78781 2.7763 1.0813 -2.2145 7.5503 -0.68890 WEIGHTED ELASTICITY AGGREGATE AT MEANS ELASTICITY 0.19410 0.18107 0.37451 0.34937 0.17495 0.16321 -0.11355 -0.10592 -0.86893E-02 -0.81059E-02 0.87301 0.81439 0.72739 0.67856 -0.79633 -0.74287 -1.7298 -1.6137

SCALE FACTOR =

MARGINAL EFFECTS ASSUME ALL VARIABLES ARE LOG-TRANSFORMED (EXCEPT DUMMY VARIABLES) VARIABLE NAME PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON MARGINAL EFFECT 0.12955 0.24996 0.11677 -0.75785E-01 -0.28859E-21 0.58267 0.21022E-04 -0.49214E-03 ----- PROBABILITIES FOR A TYPICAL CASE ----CASE X=0 X=1 MARGINAL VALUES EFFECT 0.0000 0.44231 0.58706 0.14476 0.0000 0.44231 0.70978 0.26747 0.0000 0.44231 0.57304 0.13073 0.0000 0.44231 0.36049 -0.81814E-01 8.5158 0.0000 0.44231 0.91631 0.47400 9.9711 6.9395

LOG-LIKELIHOOD FUNCTION = -53.303 LOG-LIKELIHOOD(0) = -63.037 LIKELIHOOD RATIO TEST = 19.4681 ESTRELLA R-SQUARE MADDALA R-SQUARE CRAGG-UHLER R-SQUARE MCFADDEN R-SQUARE 0.19956 0.18529 0.25218 0.15442

WITH

D.F.

P -VALUE= 0.01255

119

ADJUSTED FOR DEGREES OF FREEDOM 0.75759E-01 APPROXIMATELY F-DISTRIBUTED 0.20544 WITH CHOW R-SQUARE 0.17197 PREDICTION SUCCESS TABLE ACTUAL 0 1 0 18. 7. PREDICTED 1 18. 52. NUMBER OF RIGHT PREDICTIONS = 70.0 PERCENTAGE OF RIGHT PREDICTIONS = 0.73684 NAIVE MODEL PERCENTAGE OF RIGHT PREDICTIONS =

AND

D.F.

0.62105 36.0 59.0

EXPECTED OBSERVATIONS AT 0 = 36.0 OBSERVED = EXPECTED OBSERVATIONS AT 1 = 59.0 OBSERVED = SUM OF SQUARED "RESIDUALS" = 18.513 WEIGHTED SUM OF SQUARED "RESIDUALS" = 86.839 HENSHER-JOHNSON PREDICTION SUCCESS TABLE ACTUAL 0 1 PREDICTED 0 17.591 18.409 CHOICE 1 18.409 40.591 59.000 0.621 0.688 0.067 0.000 OBSERVED COUNT 36.000 59.000 95.000 1.000 0.612 0.083 0.177

OBSERVED SHARE 0.379 0.621 1.000

PREDICTED COUNT 36.000 PREDICTED SHARE 0.379 PROP. SUCCESSFUL 0.489 SUCCESS INDEX 0.110 PROPORTIONAL ERROR 0.000 NORMALIZED SUCCESS INDEX |_STOP

ARCH Models
ARCH (autoregressive conditional heteroskedasticity) models recognize the presence of successive periods of relative volatility and stability. The error variance, conditional on past information, evolves over time as a function of past errors. The model was introduced by Engle [1982]. Bollerslev [1986] proposed the GARCH (generalized ARCH) conditional variance specification that allows for a parsimonious parameterisation of the lag structure. Considerable interest has been in applications of ARCH/GARCH models to high frequency financial time series. The HET command in SHAZAM provides features for maximum likelihood estimation of models with ARCH or GARCH errors. Examples The examples in this section use a data set of daily exchange rate changes for the Deutschemark/British pound. The data set is from Bollerslev and Ghysels [1996] and has been adopted as a benchmark data set by McCullough and Renfro [1999] (also see the discussion in McCullough and Vinod [1999]). The model of interest is: Yt = 100 [ln(Pt) - ln(Pt-1)] = +
t

where P t is the bilateral Deutschemark/British pound rate. The topics are:


120

Testing for ARCH Estimation of a GARCH(1,1) model Benchmark comparisons of coefficients and standard errors

References Baillie, R.T. and Bollerslev, T., "The Message in Daily Exchange Rates: A Conditional-Variance Tale", Journal of Business and Economic Statistics, Vol 7, 1989, pp. 297-305. Bollerslev, T., "Generalized Autoregressive Conditional Heteroskedasticity", Journal of Econometrics, Vol. 31, 1986, pp. 307-327. Bollerslev, T. and Ghysels, E., "Periodic Autoregressive Conditional Heteroscedasticity", Journal of Business and Economic Statistics, Vol 14, 1996, pp. 139-151. Bollerslev, T. and Wooldridge, J.M., "Quasi Maximum Likelihood Estimation and Inference in Dynamic Models with Time Varying Covariances", Econometric Reviews, Vol. 11, 1992, pp. 143-172. Engle, R.F., "Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation", Econometrica, Vol. 50, 1982, pp. 987-1007. Judge, G.G., Griffiths, W.E., Hill, R.C., Lutkepohl, H. and Lee, T., The Theory and Practice of Econometrics, Second Edition, Wiley, 1985. McCullough, B.D. and Renfro, C.G., "Benchmarks and Software Standards: A Case Study of GARCH Procedures", Journal of Economic and Social Measurement , 1999, forthcoming. McCullough, B.D. and Vinod, H.D., "The Numerical Reliability of Econometric Software", Journal of Economic Literature, Vol. 37, 1999, pp. 633-665. Weiss, A.A., "Asymptotic Theory for ARCH Models: Estimation and Testing", Econometric Theory, Vol. 2, 1986, pp. 107-131.
Time Series Analysis Index Numbers Moving Averages and Exponential Smoothing Financial Time Series

Index Numbers

Statistical agencies often report time series data in the form of index numbers. For example, the consumer price index is an important economic indicator. Therefore, it is useful to understand how index numbers are constructed and how to interpret them. An introduction to index numbers is in Paul Newbold [Statistics for Business & Economics, Fourth Edition, Prentice-Hall, 1995, Chapter 17, pp. 678 - 688]. First, consider computing a price index for a single item. Suppose that price data is P1, P2, ..., PT and P0 is the price in some arbitrarily chosen base year. A price index is calculated as: 100 (Pt / P0) for t = 1, 2, ..., T
121

The price index expresses the price in every period as a percentage of the base period price. Example: The table below shows the average Canadian farm price per pound for wool (in cents). The data was retrieved from the CANSIM Statistics Canada data base (series code: D226903). The final column shows a price index where all prices are expressed as a percentage of the price in 1986. That is, the base year is 1986.
YEAR 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 PRICE 71.7 64.2 61.4 54.6 64.5 58.2 56.1 65.6 87.9 75.2 43.0 29.3 40.1 36.7 PRICE INDEX 127.81 114.44 109.45 97.33 114.97 103.74 100.00 116.93 156.68 134.05 76.65 52.23 71.48 65.42

The price index variable shows that the wool price in 1993 was 65% of the price in 1986. An aggregate price index for a group of commodities can be constructed by using some weighted average of the prices where the weights are quantities. Statistical agencies use this method to obtain the consumer price index. Various formula have been proposed by researchers and an illustration with SHAZAM is given in the next section below. When price index variables are used as explanatory variables in a regression equation the estimated coefficients must be interpreted appropriately. Consider time series Y and X where X is a price index that is 100 in the base year. The linear regression equation is: Yt =
0

Xt + et
1

where et is a random error. The coefficient

measures the change in Y for a 1 percent change in the


1.

base period price. A revision to the base period will result in an adjustment to Now consider the log-linear regression equation: ln(Yt) =
0

ln(Xt) + ut

where u t is a random error. With this specification the coefficient 1 has an interpretation as an elasticity. This measure has the appeal that it does not depend on units of measurement.

Calculating price indexes

The INDEX command in SHAZAM can be used to calculate a price index from a set of price and quantity data on a number of commodities. A number of alternative index formula are available.
122

Further description is in the SHAZAM User's Reference Manual. In general, the format of the INDEX command is:
INDEX p1 q1 p2 q2 p3 q3 . . . / options

where p1, p2, ... are the prices, q1, q2, ... are the quantities, and options is a list of desired options. Example This example uses a data set provided by Newbold. The SHAZAM commands (filename: PRINDEX.SHA) that follow are used to compute some price indexes. For illustration purposes, observation 8 is selected for the base period. First, a price index for the stock price for a single car manufacturer is computed. Then an aggregate price index is computed using the prices and quantities of all 4 car manufacturers in the data set.
SAMPLE 1 12 * Weekly stock prices for major car manufacturers. READ P1 P2 P3 P4 20.25 4.125 5.25 46.125 19.875 4.125 6.0 45.25 19.0 4.125 5.5 45.25 19.75 4.125 5.625 46.0 20.25 3.875 6.0 48.25 19.875 3.875 5.375 48.625 19.375 4.0 5.375 47.75 19.625 4.0 5.375 50.125 21.125 4.125 5.75 51.5 22.375 4.375 5.375 51.0 25.0 4.75 7.25 54.0 23.0 4.375 6.625 52.75 * Volume of shares, in hundreds of thousands, traded in each week. READ Q1 Q2 Q3 Q4 8.2 4.3 14.4 27.1 6.3 1.5 16.0 12.9 6.7 1.3 6.9 12.1 4.5 1.9 4.4 13.6 4.3 2.7 5.0 21.9 5.4 1.5 3.8 17.3 3.8 1.7 3.1 11.7 4.3 1.5 3.8 23.8 5.4 1.8 4.9 17.0 9.5 3.5 4.1 21.4 13.7 4.4 18.1 25.0 8.3 2.6 11.3 20.5 * Compute a price index for the 1st car manufacturer - base period is week 8. GEN1 PBASE1=P1:8 GENR PINDEX1=100*P1/PBASE1 * Compute an aggregate price index INDEX P1 Q1 P2 Q2 P3 Q3 P4 Q4 / BASE=8 LASPEYRES=PALL GENR PALL=100*PALL GENR OBS=TIME(0) FORMAT(F10.0,5X,2F13.2) PRINT OBS PINDEX1 PALL / FORMAT STOP

123

The BASE=8 option on the INDEX command is used to specify that the base period is observation number 8. The Laspeyres price index is saved in the variable PALL. SHAZAM sets the base period price as 1.0. The price index variable can be multiplied by 100 to express the index in the more familiar form with 100 in the base period. The SHAZAM output can be viewed.

[SHAZAM Guide home]

SHAZAM output - Calculating price indexes

|_SAMPLE 1 12 |_* Weekly stock prices for major car manufacturers. |_READ P1 P2 P3 P4 4 VARIABLES AND 12 OBSERVATIONS STARTING AT OBS

|_* Volume of shares, in hundreds of thousands, traded in each week. |_READ Q1 Q2 Q3 Q4 4 VARIABLES AND 12 OBSERVATIONS STARTING AT OBS 1 |_* Compute a price index for the 1st car manufacturer - base period is week 8. |_GEN1 PBASE1=P1:8 |_GENR PINDEX1=100*P1/PBASE1 |_* Compute an aggregate price index |_INDEX P1 Q1 P2 Q2 P3 Q3 P4 Q4 / BASE=8 LASPEYRES=PALL BASE PERIOD IS OBSERVATION 8 LASPEYRE WILL BE STORED AS VARIABLE: PALL PRICE INDEX DIVISIA PAASCHE LASPEYRES FISHER DIVISIA 1 .930 .935 .929 .932 1623. 2 .925 .941 .914 .927 877.2 3 .911 .920 .909 .915 788.3 4 .929 .932 .926 .929 804.0 5 .972 .971 .970 .970 1218. 6 .974 .975 .973 .974 1000. 7 .957 .958 .956 .957 685.3 8 1.000 1.000 1.000 1.000 1304. 9 1.033 1.034 1.031 1.033 992.8 10 1.031 1.036 1.025 1.031 1301. 11 1.114 1.127 1.095 1.111 1656. 12 1.073 1.077 1.063 1.070 1267. |_GENR PALL=100*PALL |_GENR OBS=TIME(0) |_FORMAT(F10.0,5X,2F13.2) |_PRINT OBS PINDEX1 PALL / FORMAT OBS PINDEX1 1. 103.18 2. 101.27 3. 96.82 4. 100.64

PAASCHE 1614. 862.3 780.3 801.3 1220. 999.6 684.5 1304. 991.6 1295. 1637. 1262.

QUANTITY LASPEYRES 1625. 887.6 789.6 806.8 1221. 1002. 686.1 1304. 994.1 1308. 1685. 1278.

FISHER 1619. 874.9 784.9 804.0 1221. 1001. 685.3 1304. 992.9 1302. 1661. 1270.

PALL 92.88 91.38 90.95 92.60

124

5. 6. 7. 8. 9. 10. 11. 12. |_STOP

103.18 101.27 98.73 100.00 107.64 114.01 127.39 117.20

96.95 97.33 95.58 100.00 103.13 102.55 109.48 106.31

Moving Averages and Exponential Smoothing


The SMOOTH command provides features for smoothing data by methods of moving averages and exponential smoothing. Consider a time series with observed values X1, X2, ..., XN. A centered 5-point moving average is obtained as: for t = 3, ..., N-2 The number of periods used in calculating the moving average is specified with the NMA= option on the SMOOTH command. The simple exponential smoothing method is based on a weighted average of current and past observations, with most weight to the current observation and declining weights to past observations. This gives the formula for the smoothed series as:

where w is a smoothing constant with a value in the range [0,1]. The value for w is specified with the WEIGHT= option on the SMOOTH command. Example This example analyzes annual sales data (in thousands of dollars) of Lydia E. Pinkham from 1931 to 1960. The data set is listed in Newbold [1995, p. 691]. The SHAZAM commands (filename: MASMOOTH.SHA) below use the SMOOTH command to calculate a centered 5-point moving average and a series smoothed by exponential smoothing.
SAMPLE 1 30 READ SALES / BYVAR 1806 1644 1814 1770 1518 1103 1266 1473 1423 1767 2161 2336 2602 2518 2637 2177 1920 1910 1984 1787 1689 1866 1896 1684 1633 1657 1569 1390 1387 1289 GENR YEAR=TIME(1930) * Set the smoothing constant for exponential smoothing. GEN1 A=0.4 GEN1 W=1-A SMOOTH SALES / NMA=5 WEIGHT=W MAVE=MA5 * Graph the original data

125

GRAPH SALES YEAR / LINEONLY * Graph the smoothed series SAMPLE 3 28 GRAPH MA5 YEAR / LINEONLY STOP

The SHAZAM output can be viewed. The results for a centered 5-point moving average are listed on the SHAZAM output in the column MOVING-AVE (see Newbold [1995, Table 17.12, p. 698]). The results from exponential smoothing are listed in the column EXP-MOV-AVE (see Newbold [1995, Table 17.16, p. 710]). In the above SHAZAM commands, the MAVE= option on the SMOOTH command is used to save the moving average in the variable MA5. The GRAPH command is then used to graph the results. A graph of the sales data is shown below (see Newbold [1995, Figure 17.6, p. 695]).

The next graph shows the series smoothed by moving averages (see Newbold [1995, Figure 17.7, p. 698]).

Forecasting with Exponential Smoothing Exponential smoothing methods use recursive updating formula to generate forecasts. A comparison of these methods with ARIMA models is given in Mills [1990, pp. 153-163]. The recursive formula required by exponential smoothing methods can be programmed in SHAZAM. This is shown with examples from Newbold [1995, Chapter 17].

Simple Exponential Smoothing - for nonseasonal series with no trend


126

Holt-Winters Exponential Smoothing - for nonseasonal series with trend

References Terence C. Mills, Time Series Techniques for Economists, 1990, Cambridge University Press. Paul Newbold, Statistics for Business & Economics, Fourth Edition, 1995, Prentice-Hall.

[SHAZAM Guide home] SHAZAM output - Moving Averages and Simple Exponential Smoothing
|_SAMPLE 1 30 |_READ SALES / BYVAR 1 VARIABLES AND

30 OBSERVATIONS STARTING AT OBS

|_GENR YEAR=TIME(1930) |_* Set the smoothing constant for exponential smoothing. |_GEN1 A=0.4 |_GEN1 W=1-A |_SMOOTH SALES / NMA=5 WEIGHT=W MAVE=MA5 CENTRAL MOVING AVERAGES - PERIODS= 5 NSPAN= 1 WEIGHT= 0.600 OBSERVATION SALES MOVING-AVE SEAS&IRREG SA(SALES ) EXP-MOV-AVE 1 1806.0 ------------1806.0 1806.0 2 1644.0 ------------1644.0 1708.8 3 1814.0 1710.4 1.0606 1814.0 1771.9 4 1770.0 1569.8 1.1275 1770.0 1770.8 5 1518.0 1494.2 1.0159 1518.0 1619.1 6 1103.0 1426.0 0.77349 1103.0 1309.4 7 1266.0 1356.6 0.93322 1266.0 1283.4 8 1473.0 1406.4 1.0474 1473.0 1397.2 9 1423.0 1618.0 0.87948 1423.0 1412.7 10 1767.0 1832.0 0.96452 1767.0 1625.3 11 2161.0 2057.8 1.0502 2161.0 1946.7 12 2336.0 2276.8 1.0260 2336.0 2180.3 13 2602.0 2450.8 1.0617 2602.0 2433.3 14 2518.0 2454.0 1.0261 2518.0 2484.1 15 2637.0 2370.8 1.1123 2637.0 2575.9 16 2177.0 2232.4 0.97518 2177.0 2336.5 17 1920.0 2125.6 0.90327 1920.0 2086.6 18 1910.0 1955.6 0.97668 1910.0 1980.6 19 1984.0 1858.0 1.0678 1984.0 1982.7 20 1787.0 1847.2 0.96741 1787.0 1865.3 21 1689.0 1844.4 0.91574 1689.0 1759.5 22 1866.0 1784.4 1.0457 1866.0 1823.4 23 1896.0 1753.6 1.0812 1896.0 1867.0 24 1684.0 1747.2 0.96383 1684.0 1757.2 25 1633.0 1687.8 0.96753 1633.0 1682.7 26 1657.0 1586.6 1.0444 1657.0 1667.3 27 1569.0 1527.2 1.0274 1569.0 1608.3 28 1390.0 1458.4 0.95310 1390.0 1477.3 29 1387.0 ------------1387.0 1423.1 30 1289.0 ------------1289.0 1342.7 1 SEASONAL FACTORS

127

1.0000

|_* Graph the original data |_GRAPH SALES YEAR / LINEONLY 30 OBSERVATIONS SHAZAM WILL NOW MAKE A PLOT FOR YOU NO SYMBOLS WILL BE PLOTTED, LINE ONLY |_* Graph the smoothed series |_SAMPLE 3 28 |_GRAPH MA5 YEAR / LINEONLY 26 OBSERVATIONS SHAZAM WILL NOW MAKE A PLOT FOR YOU NO SYMBOLS WILL BE PLOTTED, LINE ONLY |_STOP

Financial Time Series


Charts of Stock Market Prices Portfolio Selection Option Pricing

References Benninga, S., Numerical Techniques in Finance, MIT Press, 1989. Berndt, E.R., The Practice of Econometrics, Addison-Wesley, 1991. Campbell, J.Y., Lo, A.W., and MacKinlay, A.C., The Econometrics of Financial Markets, Princeton University Press, 1997. Clewlow, L., and Strickland, C., Implementing Derivatives Models, Wiley, 1998.
Diana Whistler diana@shazam.econ.ubc.ca Copyright 1995 - 2008 All Rights Reserved.

SHAZAM Resources for:


R. Carter Hill, William E. Griffiths and Guay C. Lim, Principles of Econometrics, Third Edition, Wiley, 2008. Data Sets

Interpreting SHAZAM output from the food expenditure equation estimation (pdf file)

128

Command Files: Tested with SHAZAM Version 10. SHAZAM missing value codes
Chapters 2, 3, 4 The Simple Linear Regression Model olsSHA.txt testSHA.txt The food expenditure equation Interval estimation and hypothesis testing

samp10SHA.txt A repeated sampling experiment predictSHA.txt Prediction nortestSHA.txt The Jarque-Bera test for normally distributed errors funcSHA.txt grateSHA.txt wageSHA.txt An example of choosing a functional form Estimating a growth rate A wage equation and prediction with a log-dependent variable

Chapters 5, 6 The Multiple Regression Model salesSHA.txt beerSHA.txt Sales model for a hamburger chain Restricted least squares

famincSHA.txt Model specification and the RESET test carSHA.txt A model of car gasoline consumption

Chapter 7 Nonlinear Relationships wage7SHA.txt houseSHA.txt pizzaSHA.txt A wage equation - polynomials and applications of dummy variables Dummy variables: The university effect on house prices Demand for pizza

wage75SHA.txt The wage equation with a log-dependent variable Chapter 8 Heteroskedasticity hetSHA.txt White standard errors, weighted least squares, multiplicative heteroskedasticity and testing for heteroskedasticity

129

glsSHA.txt

A heteroskedastic partition and the Goldfeld-Quandt test

Chapter 9 Dynamic Models, Autocorrelation and Forecasting sugarSHA.txt An area response model for sugarcane

autcovSHA.txt SHAZAM calculations for Newey-West adjusted standard errors and the residual correlogram irateSHA.txt Autoregressive (AR) models, finite distributed lags and autoregressive distributed lag (ARDL) models

Chapter 11 Simultaneous Equations Models systemSHA.txt 2SLS estimation for the demand and supply of truffles fishSHA.txt Supply and demand at the Fulton Fish Market

Chapter 14 Time-Varying Volatility and ARCH Models archSHA.txt Estimation of ARCH models

Chapter 15 Panel Data Models poolSHA.txt An investment function for two firms

pool10SHA.txt The fixed effects model panelSHA.txt A microeconomic panel data model

Chapter 16 Qualitative and Limited Dependent Variable Models probitSHA.txt medalSHA.txt tobitSHA.txt mrozSHA.txt Probit and logit estimation of a model of transport choice Poisson regression: A model of medal totals at the Olympic Games Tobit model estimation Tobit estimation of a labour supply function

R. Carter Hill, William E. Griffiths and Guay C. Lim, * Principles of Econometrics, Third Edition, Wiley, 2008. * Working with the Simple Linear Regression Model * Chapters 2, 3, and 4. * Data set on food expenditure and weekly income from a * random sample of 40 households. SAMPLE 1 40

130

READ (food.txt) FOOD INCOME * The STAT command reports descriptive statistics (Table 2.1, p. 18) STAT FOOD INCOME * * * * * * * * * * * * * The OLS command reports a estimation results that include: - OLS parameter estimates (p. 22) - the income elasticity evaluated at the sample means (p. 24) - estimated standard errors (p. 36). The estimation results are summarized in Figure 2.9, p. 25. The OLS command also reports: - the estimate of the error variance SIGMA**2 (p. 35) - sum of squared residuals SSE (p. 24) The LIST option gives the predicted values and residuals (Table 2.3, p. 35). The right portion of SHAZAM output from the LIST option shows a crude residual plot. The PCOV option reports the estimated variances and covariances of the least squares estimators (p. 35).

* The OLS command also reports: * - t-test statistics for "tests of significance" and a p-value for * a 2-sided test (p. 63). * - the R-square (p. 83). OLS FOOD INCOME / LIST PCOV STOP

Examples to accompany the SHAZAM User's Reference Manual Version 10


A note on using SHAZAM procedures should be reviewed. The SHAZAM examples can be run over the internet.

3. Data Input and Output rdataSHA.txt Reading character data with the FORMAT command. readcharSHA.txt Reading character data with the CHARVARS= option. 4. Descriptive Statistics anovaSHA.txt teststatSHA.txt A two-way ANOVA table. t-test statistic for differences in population mean and an F-test statistic for different population variances. Data file : urate.txt

stemleafSHA.txt Stem-and-Leaf Display 5. Plots and Graphs graphSHA.txt Graph of monthly time series data with dates on the x-axis. Data file : urate.txt Working with logarithms to the base 10.

6. Generating Variables log10SHA.txt wreplaceSHA.txt Sampling Without Replacement. 7. Ordinary Least Squares homeownSHA.txt Weighted Least Squares - Analysis of Proportions Data
131

8. Hypothesis Testing and Confidence Intervals hyptestSHA.txt confidSHA.txt confid2SHA.txt Linear and non-linear hypothesis tests. Interval estimation for a population mean. Confidence ellipse for 2 regression coefficients from 2SLS estimation. Also see systemSHA.txt. Data file : klein.txt Linear regression with inequality restrictions. SURE with inequality restrictions. Calculation of the partial autocorrelation function. ARIMA estimation - an example from Enders. Seasonal ARIMA models - examples with the Box and Jenkins airline passenger data set and the Enders Spanish tourism data set. Estimation and forecasting for a model with AR(1) errors. The commands show how to replicate the estimation results of the AUTO command by using OLS on transformed observations. Tests for unit roots using the Perron test applied to the Nelson-Plosser data set. Data file : nelplos.txt Johansen trace test procedure for cointegration . Command file : johanSHA.txt Data file : macro.txt Examples of programming test statistics in SHAZAM by illustrating some of the computations implemented by the DIAGNOS command. Calculations for tests for autocorrelation and tests for heteroskedasticity are shown. More examples of programming test statistics. Calculations for recursive coefficient estimates and the Hansen test of model stability are shown. More examples of programming test statistics. Calculations for RESET tests are shown. Estimation of distributed lag models including Almon lags. Testing for Granger causality. Data file : judge18.txt Forecasting with time-series cross-section data. Measuring the underground economy using the methodology of Giles and Draeseke.
132

9. Inequality Restrictions bayesSHA.txt sureqSHA.txt pacfSHA.txt arimaSHA.txt arseasSHA.txt

10. ARIMA Models

11. Autocorrelation Models ar1SHA.txt

13. Cointegration and Unit Root Tests unitrootSHA.txt johansen

14. Diagnostic Tests diagnosSHA.txt

recurSHA.txt resetSHA.txt

15. Distributed-Lag Models dlagSHA.txt gcauseSHA.txt 16. Forecasting poolfcSHA.txt fuzzySHA.txt 17. Fuzzy Set Models

18. Generalized Entropy

gmeSHA.txt glsar1SHA.txt hetregSHA.txt

Example of generalized entropy estimation. Example of generalized least squares estimation for the model with AR(1) errors. Multiplicative Heteroskedasticity Data file : credit.txt Poisson regression. Models with Beta-Distributed Dependent Variables Data file : soss.txt Estimation of a CES production function and testing for autocorrelated errors. Nonlinear seemingly unrelated regression applied to the estimation of a linear expenditure system. N2SLS, N3SLS and GMM estimation applied to Klein's Model I. Also see systemSHA.txt. Data file : klein.txt Estimation of the multiplicative heteroskedastic error model. Maximum likelihood estimation of Box-Cox models with heteroskedasticity. Poisson regression. Note that Poisson regression is implemented with the MLE command as shown in the command file poissonSHA.txt. Tobit model with heteroskedasticity. Data file : mroz.txt

19. Generalized Least Squares 20. Heteroskedastic Models

21. Maximum Likelihood Estimation of Non-Normal Models poissonSHA.txt mlebetaSHA.txt

22. Nonlinear Regression nlcesSHA.txt nlsureSHA.txt sysnlSHA.txt maxfuncSHA.txt Maximizing a function of a single variable.

Examples of the LOGDEN option

mhetSHA.txt boxhetSHA.txt poisnlSHA.txt tobithetSHA.txt

homeownSHA.txt Analysis of Proportions Data 23. Nonparametric Methods nonparSHA.txt semiparSHA.txt poolSHA.txt poolfcSHA.txt poolecSHA.txt logitSHA.txt probitSHA.txt logitwSHA.txt Nonparametric regression of a nonlinear function. Robinson's semiparametric regression. Estimation methods available with the POOL command. Forecasting with time-series cross-section data. Pooling with error components - an example of programming in SHAZAM. Logit model estimation - comparisons with the probit model are also shown. Data file : school.txt Probit model estimation and Heckit procedure. Data file : mroz.txt Weighted Logit estimation

24. Pooled Cross-Section Time-Series

25. Probit and Logit Regression

26. Robust Estimation


133

ladSHA.txt

Least Absolute Error estimation. Calculation of bootstrap standard errors is also shown. Data file : industry.txt Flexible least squares simulation experiment from Kalaba and Tesfatsion. Tobit regression. Data file : judge19.txt Calculating marginal effects for Tobit models including the McDonald and Moffitt (1980) decomposition. Data file : judge19.txt Estimation of Klein's Model I by 2SLS and 3SLS. Data file : klein.txt Computation of heteroskedasticity-consistent standard errors for 2SLS estimation. Data file : klein.txt Seasonal adjustment.

27. Time-Varying Linear Regression flsSHA.txt tobitSHA.txt tobitmSHA.txt 28. Tobit Regression

29. Two-Stage Least Squares and Systems of Equations systemSHA.txt hetcovSHA.txt

30. Data Smoothing, Moving Averages and Seasonal Adjustment smoothSHA.txt expsmthSHA.txt Moving averages and exponential smoothing. 31. Financial Time Series stockSHA.txt portfolSHA.txt eurocallSHA.txt bspriceSHA.txt Chart of stock market prices. Data file : spy.txt Portfolio selection problem. Data file : p.txt Pricing European call options. Black-Scholes formula for a call option price, put option price and implied volatility. Linear Programming. Matrix operations. OLS estimation with the MATRIX command. Computing price indexes. Example of multicollinearity diagnostics and principal components regression. Calculating p-values for test statistics. Calculating probabilities for chi-square. Calculating probabilities for non-central F.
134

32. Linear Programming lpSHA.txt matrixSHA.txt matolsSHA.txt prindexSHA.txt pcompSHA.txt pvalueSHA.txt distchiSHA.txt distfSHA.txt 33. Matrix Manipulation

34. Price Indexes 35. Principal Components and Factor Analysis 36. Probability Distributions

37. Sorting Data

wreplaceSHA.txt Sampling Without Replacement. 40. Programming in SHAZAM spliceSHA.txt powerSHA.txt ridgeSHA.txt nlsrocSHA.txt mcarloSHA.txt bootSHA.txt olscovSHA.txt Splicing price index series. Computing the power of a test. Ridge Regression. Nonlinear least squares by the rank one correction method. Monte Carlo experiments. Bootstrapping regression coefficients. Estimating the variance of the OLS estimator in the presence of heteroskedastic errors or autocorrelated errors. Non-Nested model testing. Solving nonlinear sets of equations. Multinomial logit estimation. Computing p-values for correlation coefficients. Computation of Variance Inflation Factors as an indicator of the severity of multicollinearity.

hausmanSHA.txt Hausman specification test for errors in variables. nonnestSHA.txt solveSHA.txt mnlogitSHA.txt
More Examples

corSHA.txt vifSHA.txt

archprogSHA.txt ARCH estimation using Engle's algorithm.

probelasSHA.txt Computing elasticities from probit estimation when variables have been logtransformed. Also see logitSHA.txt. Data file : school.txt bvprobSHA.txt Bivariate Probit models - Testing for zero error correlation by computing a Lagrange multiplier test statistic. Data file : school.txt Full information maximum likelihood - Klein Model I. Data file : klein.txt Square root of a matrix using: - an eigenvalue-vector decomposition - the Golub-Van Loan procedure Command file : sqrtmSHA.txt Black-Scholes option pricing and implied volatility. Command file : bsvolSHA.txt Note that Black-Scholes option pricing is implemented with the CALL and PUT commands as shown in the command file bspriceSHA.txt. Limited information maximum likelihood. Command file : limlSHA.txt Data file : klein.txt Generating multivariate random numbers. Command file : multiSHA.txt

fimlSHA.txt

41. SHAZAM Procedures sqrta sqrtm

bs impvol

liml

multi

More Procedures

135

dwpvalue gauss granger

Calculation of a p-value for the Durbin-Watson statistic Command file : dwpvalueSHA.txt Nonlinear equation estimation by the Gauss-Newton method Command file : gaussSHA.txt Testing for Granger causality Command file : grangerSHA.txt Data file : judge18.txt Granger causality tests are also available using the commands shown in gcauseSHA.txt. Robinson's heteroskedasticity of unknown form estimator Command file : hufSHA.txt Solving OLS with the Householder transformation Command file : qrSHA.txt Random coefficients models - pooled time-series cross-section data. Command file : randSHA.txt Tests for seasonal unit roots Command file : seasSHA.txt Data file : gdpcan.txt Stationarity tests proposed by Leybourne and McCabe Command file : stestSHA.txt Data file : citibase.txt Replication of the SHAZAM OLS command. Command file : olsSHA.txt

huf qr hh randcoef seasroot

stest

ols

A Note on using SHAZAM Procedures SHAZAM command files that use SHAZAM procedures may need a revision to the FILE PROC command. This is required to ensure that the procedure file can be located. Further details are in the chapter SHAZAM PROCEDURES.

Run SHAZAM over the Internet The SHAZAM examples can be run over the internet from: shazam.econ.ubc.ca/runshazam Data files can be loaded with the READ command:
READ (data/filename) variable_list / options

where filename is the name of the data file and variable_list is the list of variable names. Upper and lower case are not interchangeable for filenames.

136

To run SHAZAM procedures over the internet the procedure commands must be embedded in the command file.
* Analysis of Proportions Data * References: * William Greene, Econometric Analysis, Fourth Edition, * Chapter 19.4.6, pp. 834-837. * Gujarati, Basic Econometrics, Third Edition, * Chapter 16, pp. 556-561. SAMPLE 1 10 * Model of home ownership * Data set from Gujarati, Table 16.4, p. 557. * X = family income - thousands of $ * NTOT = number of families in the group * N = number of families owning a house. READ X NTOT N 6 40 8 8 50 12 10 60 18 13 80 28 15 100 45 20 70 36 25 65 39 30 50 33 35 40 30 40 25 20 * Calculate the proportion of families that own a house in * each income group. GENR P=N/NTOT * Calculate the odds ratio in favor of owning a house. GENR ODDS=P/(1-P) * Calculate the logit -- the proportions are in the interval (0,1) * but the logit has the range on the real number line. GENR L=LOG(ODDS) * METHOD A: WLS (see Gujarati) * Consider heteroskedastic errors - calculate the weights GENR W=NTOT*P*(1-P) * Weighted Least Squares OLS L X / WEIGHT=W NONORM NOMULSIGSQ * Estimate the percent change in the weighted odds in favour of owning * a house for a unit increase in weighted income (X). * With the TEST command, this is reported as the TEST VALUE. TEST 100*(EXP(X)-1) * METHOD B: 2-step WLS (see Greene) * Use a 2-step estimation procedure * Step 1: OLS OLS L X / PREDICT=YHAT COEF=BETA * Calculate weights for the logit model GENR PHAT=EXP(YHAT)/(1+EXP(YHAT)) GENR W=NTOT*PHAT*(1-PHAT) * Step 2: Weighted Least Squares * The UT option is used to obtain PREDICTed values that are * UnTransformed (calculated with the unweighted data). OLS L X / WEIGHT=W NONORM NOMULSIGSQ UT PREDICT=LHAT COEF=BHAT TEST 100*(EXP(X)-1) * Predict the probability of owning a house at various income levels. GENR PHAT=EXP(LHAT)/(1+EXP(LHAT)) * Calculate marginal effects - the change in the probability of * owning a house per unit change in income. GENR MARGINAL=(BHAT:1)*PHAT*(1-PHAT) FORMAT(F12.0,2F12.3)

137

PRINT X PHAT MARGINAL / FORMAT * METHOD C: Maximum Likelihood Estimation - Probit Model * Specify the log-likelihood function (see Greene, p. 836). FBX: NCDF(B1*X+B0) NL 1 / NCOEF=2 LOGDEN START=BETA EQ NTOT*(P*LOG([FBX])+(1-P)*LOG(1-[FBX])) END TEST 100*(EXP(B1)-1) STOP

138

You might also like