You are on page 1of 7

.

$?Jj@ RegressionAnalysis can Help You Create a ComputerizedEstimatingSystem LawrenceMann, Jr., AssociateProfessor LouisianaState University

Many estimatorstoday are beginningto think in terms of reducingthe drugery in their work by resortingto a compu~erprogram. A considerableportion of the data used in estimatingproceduresis either in tabular Gr graphic form. Althwgh it is possible to put tables into a computermemory, much less space is taken up if an equation is evolved. Regressionanalysis can convert tables and graphs into . mathematicalexpressions. Linear Eciuations For illustrative purposes let us use the followinghypotheticaldata letting horsepower= yjand cost = x. lable 1 2300 Volt, 3600 rpm ExplosionProof ElectricMotors x cost ~ooo 5,300 7,300 9,200 10,700 12,000 13,500 15,000

150 200 250 300 350 400 450 500

For ease of computationwe will code the data bydividinghoTU?pOWerby 100 and cost by 1000. This will give us smallernumbers with which to deal. For the present, let us assume that we know the data to be a straightline, that ,is,linear. From analyticgeometry,we learned that the expressionfor a straightlime is (1) Y =mx+d ,, where m is the slope of the line and d is the intercepton the y axis. If we plot the above data, our graph is shown in Figure 1. Since-wehave assumed that we have a straightline, we must get the value of m and~ in equation (l). (2)

/ . .=.______ ... - ~ .__lj+_Jc .. ..__ t


- .-- . .
- #_.,.. ..

.. . Fr / ..-. ------- T .-. ,. Jf--------.... _r..-, : 1 -- .-... I _ ... i .-.


f=l Ldf2E
\
--- -.,---.-. -.
!

-. . .. . --.. I :(
T-

I __ I .,. .
/--

,.;$7..L --- -.-- ----

I. ..-

,- ----------.--.

,/

.__.

.- ... ----

.. . .. L.: 1 .. .- .---.- ,.---- . ....J... ! * / i l_____ -.. --- 1 --J./-- ;~ /., ~ . .. 1 . : ------- . ..- .-- ... .. .. -. . . ~.-

-. ----- ----- --.f -..1,-. . I..


- ..-. . ..-- I . -
. -----

-L_. J----. , -.,

..... .

,.

.#

where n = number of points of data, 8 in this case ~ and d = n [ Zy-mZx 1

(3)

Equations (2) and (3) are called the normal linear regressionequations. Therefore, in order ~ completethese e~ressions}we must know X2 and XY, and the sums of x, y, x and xy. In order to get these values we repeat the data (coded) from Table 1 in Table 2. Although we do not need y2 in this immediatecomputation, it will be of value to us later on. Table 2 n -i2 3 4 5 6 7 : x 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 z-m
To

s 6.00 10.60 18,25 27.60 , 37.45 48.00 60.75 75.00 27iriz

5.3 7.3 9.2 10.7 12,0 13.5 15.0 5

x2 TOO 28.09 53.29 84.64 114.49 144.00 182.25 225.00 847.76

2.25 4.00 6.25 9.00 12.25 16.00. 20.25 25.00 --imO

-(L. .. .

Substitutingin (2) ~; 8 (283.65)- (77.0) (26.0)= .27 8 (847.76)- (77.0)2

now substitutingin (3) d = l/8~26.O - .27 (77.Oj = .65 Thereforethe formula is

$=

.27 X+

.65

We refer to y as $ because this is, in reality,an estimateof the y value since we are using a finite sample to get the expression. QuadraticEquation Inmost calls, the data which we are consideringwill fall into either the linear or the quadraticmodel, thereforewe will limit our discussionto these cases. To illustratethis situationlet us agafn considera hypotheticalcase. This time y = cost,andx = barrel capacityof steel, cone roof storage tanks. Table 3 Cone Roof Storage Tanks

16%
2000 3000 4000 5000 6000 7000 8000

.,

Z7oo
8,500 8,800 11,200 10,900 13,000 13,200 13,700

Again, for ease of computation,we will code the data by dividing capacityand cost each by 1000: If we plotthe above listed data, we obtain the curve shown in Figure 2. . Also, again for the present, let us assume that we know the data to be a second degree curve. The general form of the quadratic (seconddegree) curve is Y = a+bx+cx2 . . -3-, (+)

.:
.

..
,

..

-FIGt)~E

The normal equationsfor ehe quadratic case are

Now code the data in Table 3 and compute the values requiredby equations (5), (6), and (7). Table 4 n -i2 : 5 6 7 ! LL 4.7 8.5 8.8 11.2 10.9 13.0 13.2 13.7 -. ~ x 22.09 72.25 77*44 125.44 118.81 169.00 174.24 187,69 946,96 ~ 2 3 4 5 6 7 8 ~ X2 -i4 9 16 . 25 36 49 64 x X3 -i8 27 64 125 216 343 512 15 X4 T 16 81 256 625 1296 2401 4096 E
~ . ++_ .

17.0 26.4 44.8 54.5 78.0 92,4 109.6 ZZZ

34.0 79.2 179,2 272.5 468.0 646.8 876.8 2561.2

Therefore,substitutingin (5), (6), and (7) 2 84.0 = 8a 1-36b -1-04c 427.4 = 36a I-204b+ 2196c 2561.2 = 204a + 1296b+ 8772c -Y. ...., . .. . . .. . . . . . . ... .. .. . . ... . . . . .. . . . .. ...
.. . ,. . . . . . ..: . . . . ..

L. ! ,1

Solving we get a = 2.94 b = 2.54 c = -,15 T.herefore~= 2.94+ 2.54x - .15x2. The no&al equationsare somewhatcomplicated to solvemanually. In this case, it would be wise to translatethe y axis by means of the expression:

In effect, this makes the mean of the XIS = O, that is, tt distributesthe values of x symmetrically about the origin, This results in a simplifiedset of normal equationswith but two unknowns.
COEFFILIEIOT y ~W?WX.ATIC@

The r value expresses the degree of relationshipbetween the variablesand, SS ~;ch, seeks to determinehow well a linear or quadraticexpressiondescribes or explains the relationshipbetween variables. The expressionfor the coefficientof correlationfor the linear case is

{8)
Substitutingfrom Table 2:

If we square the coefficientof correlationwe get the fi%value which, when expressedas a percent, is defined as the percent of variationbetween x and y which is explainedby the regressionmethod. Saying this anotherway, it is the fractionof the variation in y accountedfor by the variationin x. The differencebetween the r2 value and 1 could be said to be due to random fluctuations to an additionalvariablewhich has not been considered. The or r2 value for the linear case is .942 or .88. Obtainingan r2 value for the quadratfccase is somewhatmore involved. Expressing the coefficientof correlationnarrativelywe can say r2 = explainedvariation total variation From Table 4 we get y in Table 5 Table 5
Y .

(g)

4.7 8.5 8.8 11.2 10.9 13.0 13.2


J&J

84.0

10.5 10.5 [ 10.5 10.5 10.5 10.5 . . .. 10.5.. . ,.

-5,8 -2,0 -1.7 .7 .14 2.5 2,7 .3.2..

4:00 2,89 .49 .16 6.25 7.29 10,24 . -- .~.

:.

..

~Yh .,

= 84.0/8 = 10.5,

ThereforeZ.(y--- )2 = 64,96 = tc#al v~riation. NOW, to get the le~Plained ~ variationtf n (9) we muse getz(y - Y) . To do this we must substitutethe i x value for each n in ~ = 2.94 + 2.54x -.15x2 getting Table 6 x -i 2 3 4 5 6 7 8

4.7 8.5 8.8 11.2 10.9 13:() 13.2 13.7.-

5.33 10.5 10*5 7.42 10.5 9.21 10.5 10.70 11,89 10.5 12.78 10*5 10.5 13.37 10.5 13.66

3-

-3,08 -1.29 .2 1.39 2,28 2.87 3,16

9.49 1.66 .04 1.93 5.20 8.24 9.99 63.27

TheZ(~--2 y) of 63.27 is the explainedvartation.Thus = emlained variation ~ 63.27 ~ .975 unexplainedvariqtion- 64.96 Linear or Quadratic? In the above exampleswe assumed that we knew that the expressionwas linear or quadratic. We will now investigatea procedureto quantitatively determinefirst or second degree. If we consider the case of the cone roof tanks,we ob~ain an r2 of linear correlationof ,884 using equation (8) and the dtita fr.omTable. We have already seen above that the r2 value for non4 linear correlation,assuming a parabolicrelationship,is .975; At this point it is necessaryto deviate to review the applicatf.onf o the F test. The 1?test is a ratio of variances. It providesus with a method ~ for determiningwhether the. rati of twovariaricesis larger than might be expectedby chance. Since our i values are calculatedusing varianceswe can equate a ratio of r21s with a ratio of variances, Next we must specify some level of confidencewith whfch to evaluate the results. Let us say that a 95% level of confidenceis satisfactoryfor the type of estimatingwork that we are doing. Consider the followingtable: Table 7 Case Linear Quadratic : Difference rz .884 ,975 n 8 ~. k 1 2 l-r2 .116 ;025 w Y n-l-k 6 5 l.rz n-l-k ,005 r2

where n = number of observations k= degrees of freedomassociatedwith linearity and quadratic,always 1 and 29 respectively

. ...

,.

..-

...

Therefore the F test in this case is

I,n-l-k= ~2r m-l-k (-)

1.r2

Difference Quadratic

,, . . ~1,5 - .091 = 18.20 .055 Now, this must be comparedwith the tabularvalue of.F with 1 and 5 degrees of freedomat the 5% level of confidence. These tables can be found in almost any statisticsbook. The tabularvalue of F is 6.61. Since tabular F is less than calculatedF, we can say that there is a significantdifference between the r2 value for the quadraticcase and the r2 value for the linear case, thereforeour curve is a second degree one. If there were no significance differencethen we would assume the data to be linear, since no apparent advantagewould be gained by fittingthe data to the quadraticmodel.

.-1

You might also like