Sometime-:, however, it is llso useful to quantify that association. Tltis is
traditionally done using regression analysis. For example, in the instance of the association between CaO and A13OJ in the torulitea and trondhjeroites of Tabic 2.2 the question If the CaO concentration were 3.5 wt %, what would be the concentration of AJjO)?' can be answered by calculating the regression equation for the variables CaO and A12OJ. The quantification of an association is earned out by fitting a straight line through Use data and finding the equation of that Line. I ke equation for a straight line relating variables x and y is y = a + bx . [2.4] The constant <s is the value oty given by the straight line at x = 0. The constant t> is the slope of :he line and shows the number of units increase (or decrease) 107 that accompanies an increase in one unit of x. The constants a and k are determined by fining the straight line to the data. The relation above is ideal and docs not allaw for any deviation from the line. However, in realiry this is noe die case for most observations are made with some error; often the data form 1 cloud of points to which a straight line must be fitted. It is this which introduces some uncertainty to line-fitting procedures and has resulted in a number of alternative approaches. Regression analysis is the subject of a number of statistical texts (e.g. Draper and Smith, 1981) and a useful review of fitting procedures in the earth sciences is given by Troutman and Williams (1987). Below some of the more popular forms of regression are described.
2.*.1 Ordinary least squares regression
Ordinary least squares recession is traditionally one of die most commonly
used line-fitting techniques in geochemistry because it is relatively simple to use and because computer software with which to perform the calculations is generally readily available. Unfortunately, it is often not appropriate. The least squares best-fit line is constructed so that the sum of the squares of the vertical deviations about the toe is a minimum. In this case the variable r is the independent (non-random) variable and b assumed ic have a very small error; >, on the other lurid, is die dependent variable (the random variable), with errors an order of magnitude or more greater than the errors on x, and is to be determined from values of x. In this case we say that y is regressed on x (Figure 2.2a). It is possible to regress x on y and in this case the best-nc line minimizes the sum of die squares of the horizontal deviations about the line (Figure 2.2b). Thus there arc two possible regression lines for the same data, a rather unsatisfactory situation for physical scientisis who prefer a unique line. The two lines intersect at the mean of the sample (Figure 2.2c) and approach each other as the value of the correlation coefficient (r) increases until they coincide at r = 1 . In the care of ordinary least squares regression, where y is regressed on x, the value of the intercept, s, may be computed from: = 9-K [2.5] where i and j are the mear. values for variables x and y and f> is the slope of the line. The slope b is computed from [2.6] * = riS/S,)