Professional Documents
Culture Documents
Introduction
Regression analysis A statistical method by which estimates are made of the value of a variable from a knowledge of the values of one or more other variables, and the errors involved in this estimating process measured Normally used in situations where relationships between variables is unique Main types Simple Linear Regression Analysis Multiple Linear Regression Analysis
the not
Assumptions
The standard deviation in the error associated with the dependent variable cost remains constant throughout the domain This error is normally distributed The effect of any variable is always expressed in terms of a fixed cost increase or decrease, irrespective of project size or type
Two-variable linear regression describes the relationship between two variables by computing a straight line through the data obtained
Dependent
Independent
variable (x) the factor from which the estimates are made (a)- the value of y when the independent variable is zero coefficient of x (b)- The slope of the line for straight line
Constant
The
Expression
y=a+bx
Dependentvariable
b=tan
a
Independentvariable
Predictionwithinthe range of values in the dataset is known asinterpolation Predictionoutsidethis range of the data is known asextrapolation
Specification
on the relationship
Form equations to represent the relationships between variables Since the population parameters are unknown, sample is considered and the model is built with estimated values
Estimation
Lean squares estimation procedure is used most of the time Include a series of statistical tests to make sure that the estimated model is a good representation of the postulated relationship
Validation
Evaluate the quality of the model Evaluated on the basis of following statistics
o Coefficient of determination o Standard error o F ratio test- The ratio of the regression mean square to the residual mean square o T ratio test- The ratio of the coefficient to its standard error
Forecasting
Forecasting should be satisfactory to the users Accuracy depends on the acceptable error amount of the model
This aims to create a relationship with the dependant variable with several other independent variables.
y=a+b1x1+b2x2+b3x3+bnxn+e
STEPSOFMLRMODEL
Specification Begins with theoretical reasoning on the relationship between variables Selecting a full set of explanatory(Independent) variables
Estimation
Determining the correlation coefficients between all possible pairs. Resolving multicollinearity Eliminating non-significant variables one at a time until all the remaining variables are significant.
o Use of the t-ratio- a large t-ratio is desirable. o Use of the F-ratio -Test for the significance of the overall dependence of y on the variables (x1, x2, . . . , xn )
STEPSOFMLRMODEL
o Making estimates of the coefficients in the regression model, the method of least squares is used due to its simplicity.
Validation
Validation done before practical use in construction industry using another actual project. Forecasting If validation is a success practical use on construction projects
Case Study: Consider the possible sample values of bricklayer hours and areas of brickwork from 10 fictitious contracts in the following table
Scatter is caused by the factors other than area which affect the hours required
o Bricklayer-hours : Independent / Response variable o Areas of brickwork : Dependent / Regression variable
To avoid individual judgement in constructing the line method of least squares used In fitting the regression line to a set of data, several parameters are estimated which need to be tested for the significance before being accepted As an overall guide to the strength of association between the two variables the correlation coefficient is calculated
Perfect correlation = 1
Shows an excellent degree of correlation which cannot be found by using one variable only Standard error of estimate ; anticipated difference between the actual values and what the regression line predicts, should be calculated
Type of pour Total volume(m3) Number of trucks on job Average volume of load(m3) Start time
Ave rage truc k time (minute s) o Numbe ro flo ads o We athe r o Co nc re te mix
o
cyc le
Calculating
Resolving
multicollinearity by removing one variable (Total volume) out of the highly corerated two variables ( i.e Total volumeandNo.ofLoads).
Estimating partial regression coefficients and the corresponding t-statistics from the regression on actual productivity for all eight explanatory variables. Insignificant variables have small absolute values-Should be eliminated Carrying out two further runs, eliminating the insignificant variables: concrete mix (t-statistic=0.97) and the start time (t-statistic=1.72) from the regression model.
An important assumption made is the variability of the data does not change for different levels of the response or explanatory variables.
o This is checked by carrying out residual plots.
Constructing a multiple linear regression model for actual productivity for a single server concrete system.
Pactual =1.31Tp+1.75Va+0.56Tn+0.59W0.01Ct0.37Ln6.95
Validation is done by using an actual concrete pours from another wastewater project in Scotland by a different contractor (Project B). The actual productivities achieved on 32 operations observed on Project B are compared to the predicted productivities using the derived regression model
Drawbacks in regression
Multicollinearity
If the explanatory variables in multiple regression are correlated, and if the correlation coefficient (positive/ negative) is high it is difficult to get their separate effects on the dependent variable. Leads to a poorly estimated partial regression coefficient.
Omitted variables
If independent variables that have significant relationships with the dependent variable are left out of the model, the results will not be satisfactory. E.g location, quality etc cannot be quantified Biasness of selecting independent variables
Endogeneity
Simple linear and multiple regression using least squares can be done in somespreadsheet applications and on some calculators. Specialized regression software has been developed for use in fields (survey analysis, neuro imaging). TheConstructive Cost Model (COCOMO)- An example of an algorithmicsoftware cost estimation model developed using basic regression formula.
Conclusions
Regression Analysis falls under the Algorithmic Cost Model which uses mathematical formulae linking costs/inputs with metrics to produce an estimated output. It is used not only for estimating costs but also for forecasting productivity, time and any other parameter. A widely used method not just in the construction Industry. When there is only one major factor affecting the response SLR can be used When there are more than 1 major factor affecting the respone MLR can be used There are several drawbacks and limitations in this method. The knowledge of using Regression analysis in a specialized cost estimation software, in spread sheets and in a calculator is beneficial for the Quantity Surveyor