You are on page 1of 15

HousingPricePredictionModel

BusinessResearchAssignment
FulltimeMBA2009Utrecht

Dateofsubmission:17thNovember2009
Wordcount:900words(excludingAppendix)

FTMBA09,UBNumber:09028224

BusinessResearchAssignment,FTMBA09,UBNumber:09028224

TableofContents

ExecutiveSummary.................................................................................................................................1
Introduction............................................................................................................................................1

Objective...........................................................................................................................................1

DataandMethodology.....................................................................................................................1

DataAnalysis...........................................................................................................................................1
LinearMultipleRegressionAnalysis.......................................................................................................2
Conclusion...............................................................................................................................................4
Recommendations..................................................................................................................................4
Appendix.................................................................................................................................................5

Tableofillustrations

Tables
Table1:DescriptivestatisticsofeachvariablefromdistrictAandB....................................................1
Table2:Correlationstable.....................................................................................................................2
Table3:MultipleRegressionAnalysisofPrice,H_Size,Age,District,H_DistandAge_Dist.................2

Charts
Chart1:BoxplotofpriceindistrictAandB..........................................................................................1
Chart2:Histogramofresidualvaluefromregressionmodel3.............................................................3
Chart3:Residualvalueplotagainstpredictedvaluefromregressionmodel3....................................3
Chart4:Scatterplot,Yaxis=Price,Xaxis=H_Size,Zaxis=Ageseparatedbydistrict.......................4

HousingPricePredictionModel

November17,2009

DataAnalysis
Executivessummary
This report has developed a reliable housing price
predictionmodeltoforecastthesellingpriceinDistrictA
andBbyusinglinearmultipleregressiontechnique.Our
modelcanexplain88.6%oftotalvariationinpricewithin
therelevantrangeofhousesizeandageofhouse.

Introduction
Objective

To develop a regression model as a tool for predicting


thesellingpriceofresidentpropertiesinbothdistrictsin
thecity

DataandMethodology
Several real estate agents and property assessors were
interviewed in order to identify what the major
explanatory variables are that might affect the price of
properties. The following independent variables were
considered:
Quantitative variables: H_Size (House size in square
feet),L_Size(Lotsizeinacres),Age(Houseageinyears),
Attract(Anattractivenessratingofthepropertyranging
from 0 to 100, the higher the better), P_Tax (Property
tax of the prior year in dollars), N_Rooms (Number of
bedroomsinthehouse)

Chart1:BoxplotofpriceindistrictAandB
ThemedianofpropertypriceindistrictBishigherthan
thatofindistrictAandtherearenooutliers.Thismeans
thatthepricedataarereliable.
DistrictA(District=0)

DistrictB(District=1)

Qualitative variable: District (The district in the city: 0


fordistrictA,1fordistrictB)
The data consists of 625 properties sold in the past 3
months. We used linear multiple regression by adding
the dummy variable (District) and interaction terms
(H_Dist: H_Size*District, Age_Dist: Age*District)
techniquetofindouttheforecastingmodelthatgivethe
most suitable relationship between independent
variablesandPrice(dependentvariable)ineachdistrict.

RealEstateAssociation

Table1:Descriptivestatisticsofeachvariablefrom
districtAandB

HousingPricePredictionModel

Table 1 shows that the average of the housing price in

November17,2009

LinearMultipleRegressionAnalysis

districtB (USD453,980.94) is moreexpensive than that


in district A (USD 226,174.77). Consequently, average
propertytaxindistrictBismoreexpensivethanthatin
districtA(USD5,300.90indistrictBandUSD1,655.65in
district A). Additionally, average house size and lot size
in district B are 4,055.05 square feet and 1.4568 acres

respectively, bigger than those in district A, which are


2,032.47squarefeetand0.6608acresrespectively.The
averageageofahouseindistrictBis47.28years,older
than in district A, which is 12.57 years. Average
attractivenessandnumberofbedroomsinbothdistricts
arenotsignificantlydifferent.

Table 3: Multiple Regression Analysis of Price, H_Size,


Age,District,H_DistandAge_Dist
Model1:
Prce

50215.092

94.322H_Size

1241.796Age

(0.000)(0.000)(0.000)
6.994H_Dist

Table2:Correlationstable

1087.526Age_Dist

(0.023)(0.001)
=0.886,Adjusted

As shown in Table 2, Attract and N_rooms have no

=0.885

significant relationship to Price. However, H_Size,

Std.ErroroftheEstimate=46669.901

L_Size, Age and P_Tax have a significant relationship

between each other. This means that there is multi


collinearitybetweenindependentvariables.

RealEstateAssociation

HousingPricePredictionModel

November17,2009

Model2:
Prce

there is collinearity between them, it is acceptable

47888.970

95.212H_Size

1211.297Age

Lastly,residualpatternanalysisofmodel3showsthere

(0.000)(0.000)(0.000)

4.856H_Dist

1029.883Age_Dist

becauseAge_DistisinteractiontermofAgeandDistrict.

is no evidence to violate normality, constant variance

8885.618District

andindependenceoferrorsassumptions.(Seechart2,3)

(0.384)(0.004)(0.646)
=0.886,Adjusted

=0.885

Std.ErroroftheEstimate=46699.619

Model3:
Prce

42025.525

98.102H_Size

1212.041Age

(0.000)(0.000)(0.000)
1025.601Age_Dist

22962.084District

(0.004)(0.031)
=0.886,Adjusted

=0.885

Chart 2: Histogram of residual value from regression


model3

Std.ErroroftheEstimate=46690.584

All models have the same R2and adjusted R2. Model 1


hastheleastStd.ErroroftheEstimate.However,model
1 violates the principle of marginality. It is not practical
to stipulate and fit a model that includes interaction
terms but eliminates the main effect from the dummy
variable. Model 2 dropped because H_Dist and District
termshave95%potentialtohavenolinearrelationship
with Price (evaluated from pvalues of coefficients of

H_Dist and District terms). Therefore, model 3 is the


most appropriate to be our forecasting model. Firstly,

Chart3:Residualvalueplotagainstpredictedvaluefrom
regressionmodel3

all of pvalues of coefficient show that all regressors


have significant effect on Price. Secondly, even though
VIFsofAgeandAge_Distaremorethan10,whichmean

RealEstateAssociation

HousingPricePredictionModel

November17,2009

Recommendations

Conclusion

Our price prediction equations in each district are


reliableonlyifH_SizeandAgevaluesareintherelevant
range. For example, in District A, H_Size range is from
1,101 to 3,000. Age range is from 0 to 25. In District B,
H_Sizerangeisfrom2520to5493.Agerangeisfrom11
to80.

Chart 4: Scatter plot, Yaxis = Price, Xaxis = H_Size,

UBNumber:09028224

Zaxis=Ageseparatedbydistrict
FulltimeMBA2009,
PredictionequationforDistrictA(District=0):
TiasNimbasBusinessSchool,Utrecht
42025.525

98.102 _

1212.041

TheNetherlands

PredictionequationforDistrictB(District=1):
64987.609

98.102 _

186.44

88.6% total variation in price can be explained by size


andageofthehouseforbothdistricts.Housesizehasa
positive effect on price, but house age has a negative
effectonprice.Inbothdistricts,whenH_Sizeincreases1
squarefootwhileAgestaysthesame,pricewillincrease
USD98.102.Moreover,whenAge increases1yearwhile
H_Size stays the same, price in district A will decrease
USD 1,212.041 but price in District B will decrease only
USD186.44.

RealEstateAssociation

Appendix
1 Defineobjective
Todeveloparegressionmodelasatoolforpredictingthesellingpriceofresidentpropertiesinboth
districtsinthecity

2 Specifymodel
Usinglinearmultipleregressionmodel(1dependentvariableandmanyindependentvariables)

3 Collectdata
Thedataconsistsof625propertiessoldinthepast3monthsbothinDistrictAandDistrictB.

3.1 Dependentvariable

Price(HousesellingpriceinUSD)

3.2 Initialindependentvariables
Quantitative

H_Size(Housesizeinsquarefeet)

L_Size(Lotsizeinacres)

Age(Houseageinyears)

Attract (An attractiveness rating of the property ranging from 0 to 100, the higher the
better)

P_Tax(Propertytaxoftheprioryearindollars)

N_Rooms(Numberofbedroomsinthehouse)

Qualitative

District(Thedistrictinthecity:0fordistrictA,1fordistrictB)

Page|5

4 DescriptiveDataAnalysis

Figure1:BoxplotofPrice
There are no outliers data in Price. Median of price in District B is more expensive than that in
DistrictA

H_Size

L_Size

Age

P_Tax

N_Rooms

Attract

Figure2:Boxplotofallquantitativevariables
Therearenooutliersdatainanyofindependentvariables.H_Size,L_Size,AgeandP_Taxhavethe
samepatternofboxplot.

Page|6

DistrictA(District=0)

DistrictB(District=1)

Figure3:DescriptivestatisticsofeachvariablefromdistrictAandB

TheaverageofthehousingpriceindistrictB(USD453,980.94)ismoreexpensivethanthatindistrict
A(USD226,174.77).Consequently,averagepropertytaxindistrictBismoreexpensivethanthatin
districtA(USD5,300.90indistrictAandUSD1,655.65indistrictB).Additionally,averagehousesize
andlotsizeindistrictBare4,055.05squarefeetand1.4568acresrespectively,biggerthanthosein
districtA,whichare2,032.47squarefeetand0.6608acresrespectively.Theaverageageofahouse
indistrictBis47.28years,olderthanindistrictA,whichis12.57years.Averageattractivenessand
numberofbedroomsinbothdistrictsarenotsignificantlydifferent.

Page|7


Figure4:Correlationbetweeneachofvariablesinbothdistricts

Attract and N_rooms have no significant relationship to Price. However, H_Size, L_Size, Age and
P_Taxhaveasignificantrelationshipbetweeneachother.Thismeansthatthereismulticollinearity
betweenindependentvariables.

Page|8

5 EstimateunknownparameterandEvaluatemodel
WehaveoneindependentquantitativevariablethatisDistrict.Therefore,weaddDistrictasdummy
variableintolinearmultipleregressionmodel,created2interactionterms(H_Dist:H_Size*District,
Age_Dist: Age*District). Attract and N_Rooms are eliminated because they are no relationship to
Price(fromcorrelationanalysis).WedecidetonotaddingP_Taxbecauseitisnecessarytoknowthe
pricebeforewepaythetaxthatmeansitisnotsuitabletoadditinpricepredictionmodel.

Figure5:StatisticresultsfromSPSS
(

Page|9

R2=0.886,AdjustedR2=0.885andStandardErroroftheEstimate=46733.169
Ftest(Overalltest)
:

F=799.862,pvalue=0.000whichislessthan0.05(95%confidentinterval)

We reject null hypothesis. We are 95% confident that there are significantly linear relationship
betweenindependentvariablesanddependentvariable.

Ttest(Individualtest)
We fail to reject

0:

0, t = 0.334, pvalue = 0.739 which is more than 0.05. We are 95%

confidentthatthereisnosignificantlylinearrelationshipbetweenL_SizeandPrice.

Now,weeliminateL_Sizefromtheinitialmodel

Page|10


Figure6:StatisticresultsfromSPSS

R2=0.886,AdjustedR2=0.885andStandardErroroftheEstimate=46699.169
Ftest(Overalltest)
:

F=961.192,pvalue=0.000whichislessthan0.05(95%confidentinterval)

We reject null hypothesis. We are 95% confident that there are significantly linear relationship
betweenindependentvariablesanddependentvariable.

Ttest(Individualtest)
We fail to reject

0:

0, t = 0.872, pvalue = 0.384 which is more than 0.05. We are 95%

confidentthatthereisnosignificantlylinearrelationshipbetweenH_DistandPrice.

Even though we fail to reject

0:

0, t = 0.460, pvalue = 0.646 which is more than 0.05, we

keepitinourmodelbecauseofitisdummyvariable.Itisnotpracticaltostipulateandfitamodel
thatincludesinteractiontermsbuteliminatesthemaineffectfromthedummyvariable.

Now,weeliminateH_Distfromthemodel

Page|11

Figure7:StatisticresultsfromSPSS

R2=0.886,AdjustedR2=0.885andStandardErroroftheEstimate=46690.584
Ftest(Overalltest)
:

F=1201.765,pvalue=0.000whichislessthan0.05(95%confidentinterval)

We reject null hypothesis. We are 95% confident that there are significantly linear relationship
betweenindependentvariablesanddependentvariable.

Page|12

Ttest(Individualtest)
Werejectallnullhypotheses(

0:

0,

0:

0,

0:

0,

0:

0,

0:

0).

Allpvaluesaremorethan0.05.Weare95%confidentthattherearesignificantlylinearrelationship
betweeneachindependentvariablesandPrice.

6 Predictionmodel

Prce

42025.525

98.102H_Size

1212.041Age 1025.601Age_Dist

22962.084District

PredictionequationforDistrictA(District=0):
42025.525

98.102 _

1212.041

PredictionequationforDistrictB(District=1):
64987.609

98.102 _

186.44

Page|13

You might also like