You are on page 1of 7

2011 10th Mexican International Conference on Artificial Intelligence

Application of artificial neural networks to predict the selling price in the real estate valuation process
Youness El Hamzaoui and Jose Alfredo Hernndez Perez
Centro de Investigacin en Ingeniara y Ciencias Aplicadas (CIICAp), UAEM. Av. Universidad No. 1001, Col Chamilpa, C.P. 62209, Cuernavaca, Morelos, Mxico. {youness, alfredo}@uaem.mx

Abstract- An artificial neural networks (ANN) approach was


applied to develop a mathematic model which predicts the sales price of residential properties. The study is based on evaluation of sales of homes in Casablanca, Morocco Kingdom. North of Africa. A feedforward network with one hidden layer was trained using original set of residential property valuation database. The ANN was obtained by 148 sets of input-output patterns applying backpropagation algorithm. For the networks, the Levenberg-Marquardt learning algorithms, the hyperbolic tangent sigmoid transfer function and the linear transfer function were used. The best fitting training data set was obtained from an ANN architecture composed by five neurons in the hidden layer, which made possible to predict the sales price of homes. The model gave good predictions with high correlation coefficient (R2=0.952). Also, the validation of the data set simulations was in good agreement with the original data. It is suggested that the new ANN model could be used as a tool for the reliable prediction of selling price values. Keywords; Artificial neural networks; Backpropagation algorithm; Selling price; Real estate valuation.

I.

INTRODUCTION

The market approach has been used for many years and still the most common method for the appraiser to determine the value of property, this approach determines the value via the analysis of old as well as recent sales of various comparable property. It is based on the hypothesis that a well informed buyer would not pay for a property more than the acquisition cost of a property that is available on the market and that is of the same use, taking into account adjustment factors such as the period of time the property was put out for sale, the geographical location, the nature, the age the general state, the destined use [1]. However, the properties are similar but not identical in their characteristics (qualitative and quantitative variables), the appraiser is forced to resort to the process of correction or approval of similar found, sometimes using expression logicmathematics more or less successful, and often simply using the empirical descriptions (better, much better, really better, slightly worse, worse, much worse, really worse, etc) translated by the same appraiser to the numbers (usually with percentage), especially when it depends to qualitative variables, all this, in order to bring as much as possible all comparable selected your object of appraisal [2]. The problem or deficiency of this approach includes the high level of subjectivity that characterizes the time for the appraisers use their "Criteria" for approval by applying
978-0-7695-4605-6/11 $26.00 2011 IEEE DOI 10.1109/MICAI.2011.14 175

different correction factors to a series variable, especially if they are qualitative (not specified by numeric values). Finally, a vast majority of appraisers are not even concerned to verify if, after having completed the process. "Approval", according to its criteria, were certified comparable really i.e the factors applied to their variables are approached in terms mathematical each other and the property subject of the appraisal. No doubt this subjectivity when conducting the market approach affects the accuracy of calculating the value of the property [2]. In the early '80s, the massive accessibility to computers statistical packages containing coincided with the first attempt to improve market methodology consisting of multiple regression techniques. This tool, allowing the references, they could be selfcorrect, removing the subjectivity of the criterion of the appraiser. While multiple regression is to eliminate the subjectivity of the appraisers, this method has a flaw: it works with mathematical models based on rigid and incorrectly processes the qualitative variables. Another attempt to eliminate the subjectivity in the valuation process is the implementation of artificial intelligence [3], as imitation of the functioning of the brain, where the artificial neural networks (ANN) have considered to be powerful data modeling tool due to its ability to represent non linear problems, through the training, learning, and generalization ability. However, the ANN are now commonly used in many research areas as market response [4], residential property valuation [5], Chemistry [6]. They are also used for analyzing relations among economic and financial phenomena, forecasting, data filtration, generating time series, and optimization [7]. Most of the work thus far in the field of real estate valuation has a comparative approach and regression ANN multiple, focusing on the comparison of error values calculated by both systems. In all cases the results are clear positioning the ANN first with an error rate between 5% and 10%, while the multiple regression between 10% and 15%. While in some cases the results of the calculation of error are very similar, researchers agree that the ANN is a system characterized by greater precision respect to the multiple regression. In September 2005 an article was publishedAplicacin de las redes neuronales al campo de valoracin inmobiliaria where the author Lara Cabeza, makes the analysis of the influence of the variable area (Spatial location of the property within the city) to the value the property through

ANN. This study allowed determining the rate of appreciation or economic depreciation of the property studied. As is well known in the estimation of this coefficient involves many factors which weighting and quantification is difficult because in most cases are related qualitative factors in different ways each others (social, cultural, economic, history of the area, impacts and political decisions, etc). Given the ability of ANN simultaneously analyze qualitative and quantitative variables the author did an experiment to implicitly estimate the coefficient of depreciation - assessment taking as hypotheses to be tested: The ANN must be capable of assigned to a property at a price weighted according to their location predefined areas and according to land value in them. As variables entry, the author selected [8]. The choice of demarcation of the areas were taken into account: 1. Social, cultural and economic implications of population. 2. Uses. 3. Type of construction: Number of variable was selected as the age of construction taking account two factors: a) The sources of the age of the buildings are doubtful. b) The conclusions of the above study and Grudnitski Quang show that the age of a building is inversely proportional to its value only during the first two decades of life, and after this period the two variables move in proportion to relate [2]. However, the variables: building size, number of bedrooms, number of bathrooms, number of parking spaces and conservation status were considered quantitative variables, while the nine variables for the nine areas of the city were considered as qualitative variables [9]. The study objectives were to quantify the influence on the price of a property carries its location within a given space, and as result of greater or lesser value of the land on which it stands. The author has concluded in this study that the ANN has been able to assign a price witnesses property weighted according to their location in areas preset with a mean relative difference of 4.07%. Some relative percentage difference values are excessive, as statistical calculations, and therefore deserve separate analysis, in order to try the reasons for which these small anomalies were analyzed in detail the areas where were presented. The main contribution of ANN consists in the management of variable qualitative "zone" that shows its advantage over other techniques valuation such as regression, since the factor "area" does not need no numerical weighting associated itself to estimate the final value. The ANN is able to establish our own criteria, based on learning, when weigthing each of the zones, assigning a price determined on the basis of real estate values employees in the sample is ANN which considers itself so quantitative influence or weight on the final value of housing are qualitative variables such as the "zone" - When running the ANN without the nine variables "local" ANNs problems in establishing

reliable pattern of learning, because it receives contradictory input information: a property of equal features will correspond different outputs (different values market). To solve this problem we suggest the Levenberg-Marquardt learning algorithms, the hyperbolic tangent sigmoid transfer function and the linear transfer function, taking into account just the variables not correlated : location, reference urban proximity, number of levels, life in months, general rental units, surface salable, number of bedrooms, number of bathrooms, number of half bathrooms, number of parking space, elevator, valuation date, style of construction. Recently, to solve the Real Estate Appraisal problems, the application of neural networks continues to expand. The present investigation discusses the use of multilayer feedforward neural networks with the Levenberg-Marquardt learning algorithms to predict the selling price of houses through an example of its application in real estate valuation. II. MATERIALS AND METHODS

Practical case study Determination of the values of the houses in the delegation of Casablanca. Morocco Kingdom. Given a set of variables that characterize the houses, training the network to determine the values of the departments. Database It makes available a database of 148 departments located in 35 colonies. Valuations of buildings were made in Casablanca. Morocco, between February 9 and December 26, 2007. Localization It refers to the spatial location of the property within the delegation. The unit space is the minimum area determined by the same zip code (CP). In database are 35 different zip codes for 35 colonies. Each of the codes was assigned a number, so that each of the settlements they deserve in the database a number from 1 to 35. Reference urban proximity Are identified according to the location of the main characteristics of municipal and property valued in terms of its proximity to the center recognized economic: Downtown (1): Areas usually limited primary routes defined by the authority as central. Intermedia (2): Proximity is defined from primary routes generally limited intermediate speed roadways. Peripheral (3): Is limit their access and through roads in mostly recognized as a growing urban area immediately part of the city. Expansion (4): Area recognized by the authorities as potential growth. In many cases it is not defined land use and is in recognition process in terms of growth, close to being part of the city. Rural (5): Recognized by the authorities as agricultural with or without provision of services.

176

Number of levels Indicates the number of mezzanines that make up the building, using ranges corresponding to that established by the City Treasurer: 1 to 2 - 2 3 to 5 - 5 6 a 10 to 10 11th 15-15 16 a 20 to 20 21 or more 99 Life in month Is determined based on differences in the expected life under the age of type of main building. General rental units Reflects those units that are linked by the structure in which the property is under study. Surface salable It refers to floor area (private area of the building space) and accessory (terraces, ourtyards, room service, parking covered). N of bedrooms N of bathrooms N of half bathrooms N of parking space Elevator Valuation date Style of construction By the way, we need to stress again that those variables are not correlated variables. Artificial neural networks The neurons are grouped into distinct layers and interconnected according to a given architecture. As in nature, the networks function is determined largely by the connections between elements (neurons), each connection between two neurons has a weight coefficient attached to it. The standard network structure for an approximation function is the multiple-layer perceptron (or feedforward network). The feedforward network often has one or more hidden layers of sigmoid neurons followed by an output layer of linear neurons. Multiple-layers of neurons with nonlinear transfer functions allow the network to learn nonlinear and linear relationships between input and output vectors. The linear output layer lets the network produce values outside the 1 to +1 range [10]. For the network, the appropriate notation is used in two-layer networks [11]. The number of neurons in the input and output layers is given respectively by the number of input and output variables in the process under investigation. In this work, a feedforward is proposed, the input layer consists of thirteen variables ( localization, reference urban proximity, number of levels, life in months, general rental units, surface salable, number of bedrooms, number of bathrooms, number of half bathrooms, number of parking space, elevator, valuation date, style of construction), and the output layer contains one variable Selling Price (SP). The optimal number of neurons

in the hidden layer(s) ns is difficult to specify, and depends on the type and complexity of the task. This number is usually determined iteratively. Each neuron in the hidden layer has a bias, which called b1 (threshold), which is added to the weighted inputs to form the neuron showed in Equation 1 . This sum is the argument of the transfer function f.

n1 = Wi {1,1}In1 + Wi{1, 2}In2 + ... + Wi{1, k }Ink + b1

(1)

The coefficients associated with the hidden layer are grouped into matrices Wi (weights) and b1 (biases). The output layer computes the weighted sum of the signals provided by the hidden layer, and the associated coefficients are grouped into matrices Wo and b2. Using the matrix notation, the network output can be given by Equation 2:

Out = g [Wo f (Wi In + b1 ) + b2 ]

(2)

Hidden layer neurons may use any differentiable transfer function to generate their output. In this work, a hyperbolic tangent sigmoid transfer function (Tansig) in the hidden layer and a linear transfer function (Purelin) in output layer, were used for f and g, respectively [11]. If considering the transfer functions the Equation (3) is as following: (3) The f and g transfer functions are given by the following in (4) and (5), respectively

Tansig = f =

2 1= SP 1 + e( 2 n s )

(4)

Purelin = g = ns

(5)

Where S is the number of neurons in the hidden layer, K is the number of neurons in the input layer ( K = 13) , l is the number of neurons in output layer (l=1) and

Wi ,Wo and

b1s , b2l are weights and biases, respectively. The equation


(5) is not complex because is made of simple arithmetic operation and, therefore, it can be used for online estimation application for real estate valuation processes. In this work, multilayer feed-forward ANN with one hidden layer was used for all data sets. Data sets were obtained from the sales price of residential properties in Casablanca, Morocco Kingdom, North of Africa. The ANN was trained using the backpropagation algorithm. All calculations were carried out with Matlab mathematical software with the ANN toolbox.

177

Neural Network Learning A learning (or training) algorithm is defined as a procedure that consists of adjusting the coefficients (weights and biases) of a network, to minimize an error function (usually a quadratic one) between the network outputs, for a given set of inputs, and the correct (already known) outputs. If smooth non-linearities are used, the gradient of the error function can be computed by the classical backpropagation procedure [12]. Previous learning algorithms used this gradient directly in a steepest descent optimization, but recent results have shown that second order methods are far more effective. In this work, the Levenberg-Marquardt algorithm optimization procedure - in the Matlab Neural Network Toolbox. Demuth and Beale [11] was used. This algorithm is an approximation of Newtons method, which was designed to approach second order training speed without having to compute the Hessian matrix [13]. Despite the fact that computations involved in each iteration are more complex than in the steepest descent case, the convergence is faster, typically by a factor of 100. The root mean square error (RMSE) is calculated with the experimental values and network predictions. This calculation is used as a criterion for model adequacy (see Fig.1). Consequently, RMSE was used as the error function which measures the performance of the network according to the following equation:

ANN model development In this study, one layer of hidden neurons was used. The net used was feed-forward neural network trained by backpropagation algorithm. The input variables to ANN were the localization, reference urban proximity, number of levels, life in months, general rental units, surface salable, number of bedrooms, number of bathrooms, number of half bathrooms, number of parking space, elevator, valuation date, style of construction. The selling price was the experimental response or output variable. The characteristics of input and output variables are shown in Table 1.
Table1. Characteristics of input and output variables to the ANN model

RMSE =

Q 2 ( yi , pred yi , exp ) q =1 Q

(6) The topology of an artificial neural network is determined by the number of layers, the number of nodes in each layer and the nature of the transfer functions. Optimization of ANN topology is probably the most important step in the development of a model [14]. In order to determine the optimum number of hidden nodes, a series of topologies was used, in which the number of nodes was varied from 2 to 10. All ANNs were trained using the backpropagation algorithm (scaled conjugate gradient algorithm). Network training is a process by which the connection weight and bias on the ANN are adapted through a continuous process of simulation by the environment in which the network is embedded. The primary goal of training is to minimize the error function (RMSE) by searching for a set of connection weights and biases that causes the ANN to produce outputs that are equal or close to target values. In other words, the backpropagation algorithm minimizes the RMSE between the observed and the predicted output in the output layer, through two phases. In the forward phase, the external input information signals at the input neurons which are propagated forward to compute the output information signal at the output neuron. In the backward phase, modifications to the connection strengths are made, based on the basis of the difference in the predicted and observed information signals at the output neuron [15].

Where Q is the number of data points,

yl , pred is the network

prediction, yl , exp is the experimental response and l is an index of data.

Fig. 1. Recurrent network architecture to the (SP) values and the procedure used for neural network learning.

178

In total, 148 experimental sets were used to feed the ANN structure. The data sets were divided into training, validation and test subsets, each of which contains 74, 37 and 37 samples, respectively. The validation and test sets, for the evaluation of the validation and modelling power of the nets, were randomly selected from the experimental data. As all samples were normalized in the 0 1 range. So, all of the input data sets (Xi) (from the training, validation and test sets) were scaled to a new value xi as follows:

xi =

X i X min X max X min

(7)

The final topology was obtained after 105 runs of 200 iterations start from random initial weights. For each runs, it was computed the network error versus the number of neurons in the hidden layer. It was found that the network performance stabilized after inclusion of five neurons on hidden layer. So, based on the approximation of RMSE function, a number of neurons in the hidden layer equal to five, and a three layered feed forward backpropagation neural network were used for modeling the process (as depicted in Fig. 2).

(Hernndez-Prez et al., 2004) [16]. The comparison of the RMSE calculated for the learning and testing database is a good criterion to optimize the number of iterations and avoid over-fitting. In this neural network the RMSE showed that for five neurons in the hidden layer, the testing database value was small with respect to the learning database. Then, according to the RSME results, the optimal number of neurons in the hidden layer is five. In Fig. 3 shows the experimental and simulated data of the SP values for the learning database (Fig. 3A) and for the testing database (Fig. 3B). It can be observed, in Fig. 3A how the simulated results have the expected relationship with respect to experimental data. Whereas, Fig. 3B shows that the SP prediction was well correlated (R2=0.952). In Tab.2 gives the obtained parameters (Wi, Wo, b1, and b2) of the best fit for five neurons in the hidden layer. These parameters are used in the proposed model to simulate the SP values. Consequently, the proposed ANN model follows, according to the Equation (8):
2 SP = Wo (1, s). 1 + b2(1) 13 s =1 1+ exp 2. (Wi (s, k).In(k)) + b1(s) k =1
5

(8)

(9)

Where (10)

(11)
Fig. 2. Model for the prediction of SP values

Results and discussion The neural network model which was developed according to the Figure 2, involved five neurons (S=5) in the hidden layer [70 weights (Wi=65, Wo=5) and 6 biases (b1=5 and b2=1)] to predict SP values (R2=0.952). Leraning database for the model In the learning database given by the RMSE trial versus the iteration number, the algorithm was worked out for one to ten neurons in the hidden layer. The obtained results (data not shown) proved that the typical learning error decreased when the number of neurons in the hidden layer increased. Nevertheless, one of the problems that occur during feedforward neural network training is called over-fitting

(12)

(13)

(14)

179

Table 2. Weights and biases for the ANN model


Wi (S,K)
32.43357 -0.1164 -67.96 1.82006 -2.74206 77.29743 0.39949 -157.97142 2.08992 -3.5133 52.6139 0.48036 -109.11836 1.4413 -4.07215 60.53022 0.162 -189.79769 5.63675 -2.92544 19.05845 -0.1827 -24.134 -0.64262 0.22858 30.67138 1.34604 -92.759 0.95871 -0.77462 49.59305 0.16269 -195.85018 6.27244 -6.1403 0.14084 0..52318 -130.47475 2.87163 -4.57615 5.90602 0.18023 -33.4585 1.11468 -1.03088 3.02771 0.2571 -120.12358 2.22834 -2.24268 74.1462 0.13607 -124.31745 3.64085 -1.52555 103.92004 0.04799 -149.83055 3.118 -5.47035 103.92004 0.82904 -183.8434 4.3353 -6.1368

Wo(L,S)
346623.57 1444340.9 -346419.04 346636.18 -346642.77

b1(S,1)
154.38317 -2.59281 -255.46434 7.49746 -6.01217

b2(L,1)
346645.74

Validation of the proposed model Figures 4 depict the ability of the models to predict the selling price at different parameters (localization, reference urban proximity, number of levels, life in months, general rental units, surface salable, number of bedrooms, number of bathrooms, number of half bathrooms, number of parking space, elevator, valuation date, style of construction). This figure compare the simulated results with the experimental data for the test data base. It can be seen that the model succeeded in predicting the experimental results of selling price. This shows the importance of the artificial neural network to simulate the selling price. This model is not complex because simulation is realized by simple arithmetic operation, and therefore, it can be used for online estimation to predict Selling price. Thus, the network was tested and validated by comparing its predicted output values with the experimental data using an independent set of data (as shown in Fig. 4 ).

Fig. 3. Experimental and simulated data: A) for the learning database and B) the testing database

Testing database for the proposed model The statistical test of slope 1 and intercept 0 was carried out to confirm the proposed model [17]. This statistical test presented that the slope is 0.98970.0949 and the intercept is 2.1968*10^41.1954*10^5 which are between 1 and 0, respectively, Thus indicate statistically significant correlation between the experimental and simulated (SP) values without any bias. The statistical test shows the power of the model to predict the SP values under various experimental conditions such as measurement of localization, reference urban proximity, number of levels, life in months, general rental units, surface salable, number of bedrooms, number of bathrooms, number of half bathrooms, number of parking space, elevator, valuation date, style of construction. From this correlation and statistical test, it is evident that the model was successful in predicting the experimental data of SP values. Thus, this shows the importance of the artificial neural network to predict the selling price.

Fig. 4. Experimental data and simulated curve generated with the proposed model of Selling price. Real data (experimental data), and the Continuous line is the prediction

III.

CONCLUSIONS

The sales price of residential properties during the valuation process was successfully predicted by applying a threelayered neural network with five neurons in the hidden layer and using backpropagation of the Levenberg Marquardt algorithm. Simulation based on the ANN model were performed in order to estimate the behavior of the system under different variables. All the studied parameters in this work (localization, reference urban proximity, number of levels, life in months, general rental units, surface salable, number of bedrooms, number of bathrooms, number of half bathrooms, number of parking space, elevator, valuation date, style of construction) have considerable effects on the selling price. Finally, the results of modeling confirmed that the neural network modeling could effectively reproduce the experimental data and seeks the relationships between all variables, both qualitative and quantitative, without using the hard mathematical models based on rigid as the

180

regression algorithm, the network is not obligated to follow a serie of instructions. Nonetheless, the network works on the basis of learning, creates its own Rules and learn from their mistakes in order to predict the behavior of the process with least possible error, and get the optimal solution of the problem. However, the network requires some kind of patterns to work correctly, thats why we cant predict the lottery game, or horse racing, which are random process, another disadvantage of the networks have long training time, but as the computer power increases, processing time decreases, and personal computers become more and more available, the use of ANN in the real estate valuation will continue to increase. ACKNOWLEDGMENT Youness El Hamzaoui expresses his gratitude to the Consejo Nacional de Ciencia y Tecnologa (CONACYT) for the scholarship awarded to postgraduate Doctor Studies, and economic support received by CIICAP-UAEM, for the achievement of this work. REFERENCES
[1] A.Jhon , Appraising machinery and equipment, McGraw-Hill Companies, ISBN 0-07-001475-2, United State of America, 1989. [2] K.Agnieszka, Aplicacin de redes neuronales artificiales en la valuacin inmobiliaria, MBA Thesis, Universidad Nacional Autnoma de Mxico, 2008. [3] Youness.El Hamzaoui, J. A .Hernandez, M.C.Cruz-Chavez,; and A. Bassam , Search for Optimal Design of Multiproduct Batch Plants under Uncertain Demand using Gaussian Process Modeling Solved by Heuristics Methods, Chemical Product and Process Modeling: Vol. 5 : Iss. 1, Article 8, 2010. [4] C.G. Dasgupta. G.S. Dispensa, and S. Ghose, Comparing the Predictive Performance of a Neural Network Model with Some Traditional Market Response Models, International Journal of Forecasting, 10 (2)(1994) 235-44. [5] I.D Wilson, S.D.Paris, J.A. Ware, D.H.Jenkins. Residential property price time series forecasting with neural networks. Knowledge-Based Systems. 15(2002)335-341.

[6] Youness. EL-Hamzaoui , J. A. Hernndez , Susana SilvaMartnez , A. Bassam. and Cristina Lizama-Bahena, Application of artificial neural networks to predict the chemical oxygen demand removal during the aqueous treatment of alazine and gesaprim commercial herbicides, Water, Air, & Soil Pollution, 2010. [7] D.D. Hawley, J.D. Johnson, and D. Raina, Artificial Neural Systems: A New Tool for Financial Decision-Making, Financial Analysis Journal, Nov/Dec, (1990), 63-72. [8] J.M. Lara Cabeza, Aplicacin de las redes neuronales artificiales al campo de la valoracin inmobiliaria, www.mappinginteractivo.com revista internacional de Ciencia de la Tierra, Septiembre 2005. [9] V.F. Artyushkin, Neural Network Ensembles as Models of Interdependence in Collective Behavior, Mathematical Social Sciences, 19(2),(1990), 167-77. [10] F. Limin, Neural Networks in Computer Intelligence, McGraw-Hill International Series in Computer Science, 1994. [11] H. Demuth, M. Beale, Neural Network Toolbox for MatlabUsers Guide Version 3, MathWorks, MA, 1998. [12] D.E. Rumelhart, G.E.Hinton, R.J.Williams, Learning internal representations by error propagation, Parallel Data Processing. 1,(1986) 318362. [13] T. Martin, M.T.Hagan, B.N. Mohammad, Training feedforward networks with the Marquardt Algorithm, IEEE Trans. Neural Networks, 6, (1994) 989993.

[14]

A.Duran, J.M. Monteagudo.Solar photocatalytic degradation of reactive blue for using a Fresnel lens. Water Res. 41,(2007), 690-698.

[15] M.M. Hamed, M.G Khalafallah, E.A. Hassanien, Prediction of waste water treatment plant performance using artificial neural networks. Environmental Modelling and Software. 19,(2004), 919-928. [16] J.A. Hernndez-Prez, M.A. Garca-Alvarado,G. Trystram,B.Heyd, Neural networks for the heat and mass transfer prediction during drying of cassava and mango, Innovative Food Sci. Emerging Technol. 5,(2004) 5764. [17] D. Chandrasekharam, J. Bundschuh, Geothermal Energy Resources for Developing Countries, Rotherdam, S. and Zeitlinger, B.V., A.A Balkema, The Netherlands, (2002),195224.

181

You might also like