Professional Documents
Culture Documents
Several parametric and nonparametric methods have been advanced over the
years for estimating house price appreciation. This paper compares five of these
methods in terms of predictive accuracy, using data from Montgomery County,
Pennsylvania. The methods are evaluated on the basis of the mean squared pre-
diction error and the mean absolute prediction error. A statistic developed by
Diebold and Mariano is used to determine whether differences in prediction errors
are statistically significant. We use the same statistic to determine the effect of
sample size on the accuracy of the predictions. In general, parametric methods of
estimation produce more accurate estimates of house price appreciation than
nonparametric methods. And when the mean absolute prediction error is used as
the criterion of accuracy, the repeat sales method produces the most accurate
estimate among the parametric methods we tested. Finally, of the five methods we
tested, the accuracy of the repeat sales method is least diminished by a reduction
in sample size. 0 1992 Academic Press, Inc.
* The views expressed in this paper are those of the authors and do not necessarily reflect
those of the Federal Reserve Bank of Philadelphia or the Federal Reserve System.
324
1051-1377/92 $5.00
Copyright 0 1992 by Academic Press, Inc.
All rights of reproduction in any form reserved.
ESTIMATING HOUSE PRICE APPRECIATION 325
more frequently than for the parametric methods. Also, for all five meth-
ods, reducing the sample size had less effect on the mean squared predic-
tion error than on the mean absolute prediction error.
Section 1 of the paper describes the data used to estimate and evaluate
all the models discussed in Section 2. A brief comparison of the estimated
appreciation rates is presented in Section 3. Section 4 evaluates the five
methods based on the Diebold-Mariano test. Section 5 draws some con-
clusions about the choice of estimation methods for empirical research.
1. THE DATA
The primary data source for this comparative study is the Montgomery
County, Pennsylvania, tax assessment file. From that file we extracted
15,197 repeat sales of single-family detached dwellings whose second sale
occurred between 1973 and 1988. No house was included in the sample
more than once no matter how many times it was sold between 1973 and
1988 because the data on the file included only the two most recent sale
prices. Using a random number generator we divided this sample into two
subsamples, one with 7873 observations used to estimate the appreciation
rates and one with 7324 observations used to test the predictive accuracy
of the appreciation-rate estimates. Because of the small number of obser-
vations for some years, however, the test sample used in this paper was
limited to houses whose first and second sales occurred between 1976 and
1988 reducing the size of the ultimate test sample to 5687. The estimating
sample includes nearly 5% of all single-family detached dwellings in
Montgomery County, and the estimating and ultimate test sample to-
gether include more than 9% of the single-family detached units in the
county. The average interval between sales in the estimating sample was
5.6 years.
For each housing unit in the sample, we have information on the most
recent sale price, the previous sale price, the dates of sale, a number of
housing attributes, and the location by census tract.’ The housing attrib-
utes include the presence or absence of central air conditioning, a fire-
place, a garage, or a pool, the number of bathrooms, the age of the house,
the size of the house, and the lot size. The data on house price and
attributes are merged with data at the census tract level that provide
additional information on neighborhood characteristics and accessibility
to Philadelphia’s Central Business District (CBD). From the 1980 Census
I We have data on transactions in 192 of the 200 Montgomery County census tracts. We
eliminated sales of less than $10,000 and more than $l,OOO,OOO.The data on the housing
traits are at the time of the second sale. It was not possible to determine if there had been
any changes in traits between sales.
ESTIMATING HOUSE PRICE APPRECIATION 327
TABLE I
Means of the Variables
Estimating Test
sample sample
(A’ = 7873) (N = 5687)
Housing attributes
House price (1990 $) 135,243 136,442
Central air (Y/N) 0.25 0.23
Number of bathrooms 1.84 1.81
Fireplace (Y/N) 0.58 0.55
Age (years) 37.4 39.0
Garage (Y/N) 0.75 0.74
Pool (Y/N) 0.07 0.07
Living area (sq. ft.) 1,907 1,865
Lot size (sq. ft.) 17,732 17,469
Neighborhood attributes
Average household size 2.89 2.89
% Population black 3.39 3.34
% Single family detached 68.1 67.9
Accessibility attributes
Train service (Y/N) 0.52 0.51
Average commuting time (minutes) 22.5 22.4
Highway time to the CBD (minutes) 54.2 54.2
* The highway travel time to the CBD is from the Delaware Valley Regional Planning
Commission. Average commuting time is from the 1980 Census of Population and Housing.
The presence of a commuter rail station in the neighborhood is calculated as in Voith (1993).
328 CRONE AND VOITH
cific traits over time and one constraining all trait prices to increase at the
same rate. The third parametric method uses the first- and second-sale
prices to estimate annual appreciation rates.
The nonparametric methods. The simplest nonparametric estimate of
house price appreciation is the percentage increase in the mean price of
houses sold in any two periods. This method is frequently used in the
popular press but takes no account of differences in the quality of houses
sold in one period versus another. The use of the median house price
rather than the mean price is sometimes seen as an attempt to control for
differences in the quality of houses sold in different periods. The sale of
an abnormal number of very high priced or very low priced units in a
given quarter or year should influence the median price less than the mean
price. And perhaps the most frequently used measure of house price
appreciation for metropolitan areas in the United States is the percentage
increase in the median price of houses sold as calculated by the National
Association of Realtors (NAR).3
In the first two columns of Table II we report the housing appreciation
rates in Montgomery County from 1974 through 1988 estimated by these
two nonparametric methods. In calculating these appreciation rates we
included each house only for the year of the second sale in order to keep
the sample size for the nonparametric methods the same for all the meth-
ods that we examined. In the repeat sales method to be discussed later the
first and second sale represent only one observation.
The hedonic methods. The usual method of controlling for differences
in quality when comparing house prices across different markets or time
periods is to estimate a hedonic equation. The original purpose of these
models was to compare the price of a constant-quality house across hous-
ing markets (see Thibodeau, 1989), but the method can also be used to
estimate housing appreciation rates in a given market. In a straightfor-
ward application of the models used in cross-market comparisons, hous-
ing trait prices are estimated for each period in a single market. By apply-
ing the estimated hedonic prices to a standard set of attributes, the price
of a constant-quality house is calculated for each period. In our first
application of the hedonic method, trait prices are estimated for each year
according to the model in Eq. (1). We estimated the model in logs so this
unconstrained hedonic model has the same basic specification as the con-
strained model to be discussed below.
(1)
3 The NAR is careful to point out that the change in the median sale price does not
measure the change in the cost of a standard house.
ESTIMATING HOUSE PRICE APPRECIATION 329
TABLE II
Estimated Real Annual Appreciation Rates
Mean Median
sales sales Unconstrained Constrained Repeat
Year price price hedonic hedonic sales
where
A series of hedonic equations was estimated using sales data for individ-
ual years, allowing the vector of estimated trait prices (j3) to vary freely
from year to year. The sale price (P) was the price at the second sale, so
each of the 7873 observations was used only once in estimating the series
of equations. Thus, the size of the total sample was kept the same as that
for the other methods of estimation. The characteristics (X) for which
hedonic prices were estimated are those listed in Table I.4
4 The full set of regression estimates for this equation and the regression results for Eq. (2)
and (5) below are available from the authors upon request. The coefficients on the majority
of housing and neighborhood characteristics are highly significant and of the expected sign
and magnitude.
330 CRONE AND VOITH
A constant-quality house price was computed for each year using the
estimated /3s and a standard set of housing characteristics, namely, the
mean value of each characteristic given in column 1 of Table I. The
appreciation rates were calculated from the estimated price of this con-
stant-quality house. Since the estimates of /3 are allowed to vary from
year to year, we labeled this the unconstrained hedonic method. The
estimates of the yearly appreciation rates are reported in column 3 of
Table II.
We also estimated appreciation rates using a second hedonic model in
which all the trait prices were constrained to increase at the same rate in
any given year. The model is represented in Eq. (2).
6,+j = ln( 1 f a,+j), where at+j = the rate of change in house prices be-
tween peirod t + j - 1 and period t + j.
5 In the modern literature, the case for using repeat sales was first made by Bailey et al.
(1963). Case and Quigley (1991) argue for a hybrid hedonic/repeat sales method. But the
superiority of this method is questioned in Case et al. (1991).
ESTIMATING HOUSE PRICE APPRECIATION 331
k-l
where
Py-k = the market price of the house in the year of the previous sale, k
years prior to the second sale,
Thus, data on repeat sales transactions with at least some sales in each
year can be used to estimate yearly appreciation rates. The estimating
equation is
= 2 BhDh + E,
h=l
where
The estimated coefficient 6h will equal ln(l + oh), where (Yhis the housing
price appreciation rate in year h. It should be noted that the estimate of ah
is determined not only by the houses sold in year h but also by houses
whose first sale was prior to year h and whose second sale was after year
h. The estimates derived from Eq. (5) are reported as the repeat sales
estimates in the fifth column of Table II.
332 CRONE AND VOITH
Since house prices in our sample were adjusted for inflation, the appre-
ciation rates reported in Table II are estimated real appreciation rates.
When we compare the estimated appreciation rates year by year, there is
no obvious ordering according to method. Each of the methods produced
the highest estimated appreciation in at least 1 year and the lowest esti-
mated appreciation in at least 1 year. The largest differences in estimated
rates are concentrated in the years 1974 through 1976. The smaller num-
ber of observations in the years 1973 to 1975 resulted in less precise
estimates of the appreciation rates.” For some of the smaller samples used
to test the effects of sample size on accuracy there were not enough
observations to estimate the appreciation rates for 1974-1976 by the un-
constrained hedonic method. Therefore, we confined all the tests for ac-
curacy to the period 1977-1988.
The last two rows in Table II present the cumulative estimated appreci-
ation for both the H-year period 1973 to 1988 and the 12-year period 1976
to 1988. The estimates derived from the median sale price produced the
lowest cumulative appreciation over both periods. The constrained he-
donic method produced a low cumulative appreciation for the 15-year
period, but this is due mainly to the large estimated decline in 1974. For
the 12-year period 1976 to 1988 the cumulative appreciation from the
constrained hedonic method is closer to that of the other parametric
methods.
The pattern of signs on the estimated appreciation rates differs between
the nonparametric methods (columns 1 and 2) and the parametric methods
(columns 3, 4, and 5). The signs for all three parametric methods, how-
ever, are the same in any given year except for the early years 1974
through 1976. Moreover, the signs on the appreciation rates estimated by
the parametric methods reveal a clear cyclical pattern in the early 1980s.
For the recession years 1980-1982 and the preceding year all the paramet-
ric estimates are negative.
Every observation in our sample includes the house price for the most
recent sale and the previous sale, which provides a unique opportunity to
evaluate the accuracy of the five series of estimated appreciation rates.
We use the out-of-sample prediction error of the appreciation rate as the
6 No appreciation rate could be estimated for 1973, since that is the base year for our
sample. In terms of second sales, our estimating sample included only 21 observations for
1973,78 observations for 1974, and 114 observations for 1975. In the other years the number
of observations ranged from 197 to 1199.
ESTIMATING HOUSE PRICE APPRECIATION 333
2 = i $I [g(ei,) - g(ej,)l.
n-l
-\r d- NW,
fd(O))r
wherefd(0) is the spectral density of d,, at frequency zero. Based on this
distribution, Diebold and Mariano (1991) have developed a statistic
zi
DM=
DM - N(O,l)
TABLE III
Diebold-Mariano Test Statistic
for Difference in Mean Squared Prediction Error
(Loss from Method in Row Minus Loss from Method in Column):
N = 5687
diction error. The negative values of the test statistic in the last column
and the probabilities in parentheses indicate that, based on this criterion,
the other four methods are significantly more accurate than the median
sales price method. The parametric methods are also more accurate than
the mean sales price method, but the difference is unambiguously signifi-
cant at the 0.05 level only for the constrained hedonic method, marginally
significant at the 0.05 level for the unconstrained hedonic method, and
marginally significant at the 0.10 level for the repeat sales method. Among
the three parametric methods, the differences in accuracy based on the
mean squared prediction error are not statistically significant at the 0.05
level, The constrained hedonic method, however, is more accurate than
the repeat sales method at the 0.10 level.
When we use the mean absolute prediction error as the criterion, the
pattern of relative accuracy changes somewhat (see Table IV). The me-
dian sales price method is again the least accurate of the five methods, but
it is not significantly less accurate than the unconstrained hedonic
method. The two hedonic methods are not significantly more accurate
than the mean sales price method. Finally, under the mean absolute error
criterion, the repeat sales method is significantly more accurate than the
other methods tested including the two other parametric methods, and the
constrained hedonic method is significantly more accurate than the un-
constrained hedonic method.
In general, for our large sample, parametric methods produced smaller
mean squared prediction errors than nonparametric methods-in most
cases significantly smaller at the 0.05 or 0.10 level. Among the parametric
methods, there was a significant difference in accuracy at the 0.05 level
336 CRONE AND VOITH
TABLE IV
Diebold-Mariano Test Statistic
for Difference in Mean Absolute Prediction Error
(Loss from Method in Row Minus Loss from Method in Column):
N = 5687
only when the mean absolute prediction error was used as the criterion
rather than the mean squared prediction error.g
Predictive accuracy is affected not only by the method of estimation but
also by the size of the estimating sample. This is especiall’y important in
the case of the repeat sales method because the need for two sales of the
same property seriously reduces the size of available samples. To examine
the effect of sample size on the accuracy of the five methods discussed in
this article, we randomly generated from our full estimating sample of
7873 houses 20 subsamples approximately one-half the size of the total
estimating sample and 20 subsamples approximately one-quarter the size
of the total estimating sample. We estimated yearly appreciation rates by
the five methods discussed in this paper using each of these 40 subsam-
pies. Applying the Diebold-Mariano statistic, we calculated for each
method how many of the 40 subsample estimates were less accurate than
the full sample estimates at the 0.05 level of significance. The results are
reported in Table V.
Our tests indicate that the accuracy of the parametric methods is less
sensitive to sample size than the accuracy of the nonparametric methods.
Using the mean squared prediction error as the criterion of accuracy, the
nonparametric methods are significantly less accurate 9 or 10 times out of
20 when the half-sized sample is used for the estimation. They are signifi-
cantly less accurate 10 or 15 times out of 20 when the quarter-sized
9 The general conclusions in this paragraph also hold true when the full test sample of 7324
observations is used and the estimated appreciation rates for the years 1974 through 1976 are
included.
ESTIMATING HOUSE PRICE APPRECIATION 337
TABLE V
Effects of Sample Size on Accuracy
Mean Median
Repeat Constrained Unconstrained sales sales
sales hedonic hedonic price price
sample is used for the estimation. Among the parametric methods, the
highest proportion of significant reductions in accuracy based on the
mean squared error was 5 out of 20 when the quarter-sized sample was
used to estimate the unconstrained hedonic model. When the mean abso-
lute prediction error is used as the criterion of accuracy, reduction of
sample size tended to have a greater effect on the predictive accuracy of
all methods. But the effect on the parametric methods was again less than
the effect on the nonparametric methods, and the repeat sales method was
least affected by the reduction in sample size.
5. CONCLUSION
REFERENCES
ABRAHAM, J. M., AND SCHAUMAN, W. S. (1991). “New Evidence on House Prices from
Freddie Mac Repeat Sales,” Amer. Real Estate Urban Econ. Assoc. J. 19, 333-352.
BAILEY, M. J., MUTH, R. F., AND NOURSE, H. 0. (1963). “A Regression Method for Real
Estate Price Index Construction,” J. Amer. Statist. Assoc. 58, 933-942.
CASE, B., POLLAKOWSKI, H. O., AND WACHTER, S. M. (1991). “On Choosing among House
Price Index Methodologies,” Amer. Real Estate Urban Econ. Assoc. J. 19, 286-307.
CASE, B., AND QUIGLEY, J. M. (1991). “The Dynamics of Real Estate Prices,” Reu. Econ.
Statist. 73, 50-58.
CASE, K. E., AND SHILLER, R. J. (1987). “Prices-of Single Family Homes Since 1970: New
Indexes for Four Cities,” New Eng. Econ. Rev. September/October, 45-56.
CASE, K. E., AND SHILLER, R. J. (1989). “The Efficiency of the Market for Single-Family
Homes,” Amer. Econ. Rev. 79, 125-137.
CLAPP, J. M., GIACCOTTO, C., AND TIRTIROGLU, D. (1991). “Housing Price Indices Based
on All Transactions Compared to Repeat Subsamples,” Amer. Real Estate Urban Econ.
Assoc. J. 19, 270-285.
DIEBOLD, F. X., AND MARIANO, R. S. (1991). “Comparing Predictive Accuracy I: An
Asymptotic Test,” Department of Economics, University of Pennsylvania, mimeo.
HAURIN, D. R., AND HENDERSHOTT, P. H. (1991). “House Price Indexes: Issues and
Results,” Amer. Real Estate Urban Econ. Assoc. J. 19, 259-269.
HENDERSHOTT, P. H., AND THIBODEAU, T. G. (1990). “The Relationship between Median
and Constant Quality House Prices: Implications for Setting FHA Loan Limits,” Amer.
Real Estate Urban Econ. Assoc. J. 18, 323-334.
MARK, J. H., AND GOLDBERG, M. A. (1984). “Alternative Housing Price Indices: An Evalu-
ation,” Amer. Real Estate Urban Econ. Assoc. J. 12, 31-49.
MEESE, R. AND WALLACE, N. (1991). “Nonparametric Estimation of Dynamic Hedonic
Price Models and the Construction of Residential House Price Indices,” Amer. Real
Estate Urban Econ. Assoc. J. 19, 308-332.
PALMQUIST, R. B. (1980). “Alternative Techniques for Developing Real Estate Price In-
dexes,” Rev. Econ. Statist. 62, 442-448.
THIBODEAU, T. G. (1989). “Housing Price Indexes from the 1974-83 SMSA Annual Hous-
ing Surveys,” Amer. Real Estate Urban Econ. Assoc. J. 17, 110-I 17.
VOITH, R. P., (1993). “Changing Capitalization of CBD Oriented Transportation Systems:
Evidence from Philadelphia 1970-1988,” J. Urban Econ. 33, 361-376.