You are on page 1of 15

Linear and nonlinear regression techniques using Excel Solver tool Spreadsheets by S.K. Dentel 10/06.

These worksheets describe the use of regression for data analysis. It is assumed that you already know how to use the Excel Regression tool. Here we will cover: 1) Linear regression using Excel Solver 2) Nonlinear regression using Excel Solver We will not worry with derivations at this point. Why are we doing linear regression by two different methods? This is for two reasons: First, I want you to learn how to use the Solver tool because it is useful for lots of situations when you cannot get an analytical solution for an unknown. Second, if we first use the Solver to duplicate another method, we can be sure that it works before applying it to more difficult problems like nonlinear regression.

So the data below (in yellow) are what we want to analyze. First we graph them to check for outliers or other anomalies. The graph is shown below, and it has some noise, but nothing that's totally unreasonable. Now we want to compare these data to models, which means equations that we think will fit the data. We would like to see which model equation fits the data best. This equation can then be used to predict or upscale the adsorption process, from the 1-liter experiment to the pond, and to any carbon dose. Here are three possibilities: 1. Linear equation 2. Langmuir equation 3. Freundlich equation

q = mc+ b
Carbon dose mg/L 0 (control) 1 2.5 5 10 25 100 C mg/L 0.110 0.059 0.015 0.01 0.008 0.001 0.0006 0 C0-C = X mg/L 0.051 0.095 0.100 0.102 0.109 0.109

q = qmax
X/M = q mg/g 51.0 38.0 20.0 10.2 4.36 1.09 0.00

KLc 1+ K L c
70 60 50 40 30 20 10 0 0

q = KF c1/ n

q (mg/g)

0.05 C (mg/L)

0.1

s or other anomalies.

ta. We would like to see dsorption process,

1. Now let's try fitting the adsorption data to the linear model, using Solver. First, we'll do a linear regression on q vs. C. I know, the data don't look linear, but this is to show you how to do a regression using the SOLVE function. Since you already know how to use Excel's linear regression capability (if not, go to the last worksheet), you'll be able to check your results. So our function is q = mC+b. We want the best values for m and b. Here are the steps: 1. Put estimated values for m and b in a couple of cells (shown here in orange). We'll use these in the above equation for our estimated q values. 2. Put the estimated q's in a new column. We'll call this "q-hat". 3. We'll now create a number that indicates how good our estimated q values are. In another column, compute q [the measured value] minus q-hat, and SQUARE this. Sum these up : it's the "regression sum of squares" or RSS. 4. Note that if all of our estimates were exactly equal to the data, this RSS would be zero. The better the fit, the lower RSS is. So if we can readjust m and b to get a better fit, the RSS will go down. The best values of m and b will minimize the RSS. There is a tool that does this automatically, called SOLVER. If it's not under the TOOLS menu, you can get it by going to Tools/Add-Ins and adding it. 5. In Solver, set the target cell as the cell where the RSS is calculated. Have it find the MIN value,* by changing the cells that have the values for m and b. When you hit SOLVE, m and b will be changed iteratively until the RSS is minimized. The linear regression is done! Values: m= b= C0-C = X X/M = q Carbon dose C q-hat (q-q-hat)^2 Do this yourself by filling in the mg/L mg/L mg/L mg/g mg/g values: equations in the cells to the left, 0 (control) 0.110 (exclude) (exclude) then use SOLVE to minimize RSS. 1 0.059 0.051 51.0 2.5 0.015 0.095 38.0 When done, go to the next sheet. 5 0.01 0.1 20.0 10 0.008 0.102 10.2 25 0.001 0.109 4.36 *Note that when you use "minimum" rather t 100 0.0006 0.1094 1.09 "equals," you are not really using Solver to S 0 0.00 only to get the lowest value possible. It's sti RSS = Sum: a trial-and-error approach, done by numeric iteration.

er column, compute n sum of squares" or RSS.

own. The best

urself by filling in the in the cells to the left, SOLVE to minimize RSS.

ne, go to the next sheet.

when you use "minimum" rather than you are not really using Solver to Solve, t the lowest value possible. It's still -error approach, done by numerical

Example use of linear and nonlinear regression techniques to find best-fit model Below is what you should have obtained. Is this the right result? We can check it using Excel's built-in linear regression capability. Choose Data/Data Analysis/Regression, choose the q values for the Y-range, the C values for the X-range. If you did it right, the m and b values will be the same. It is always a good policy to GRAPH the model line against the data. This is done to the right.-->
q (mg/g)
Of course, the built-in linear regression also gives an r value and lots of other inf ormation. We can calculate all of that too, but what you would probably like the most is r. This is not difficult to get. The formula is

70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0

RSS where RSS is th e resid ual sum of squares, which you alread y have, and TSS is the total sum of r =1 TSS
2

squares. The TSS is most easily calculated by taking the average value of q (call it q-bar), and then determining the difference between each q and q-bar. Square these differences and sum them. This sum of squares of residu als between the qs and q-bar is the TSS. Now you have r, and it sh ould agree with the R Square already g iven by Excels built-in linear regression.

Carbon dose mg/L 0 (control) 1 2.5 5 10 25 100

C mg/L 0.110 0.059 0.015 0.01 0.008 0.001 0.0006 0

C0-C = X mg/L 0.051 0.095 0.1 0.102 0.109 0.1094 q-bar:

X/M = q mg/g 51.0 38.0 20.0 10.2 4.36 1.09 0.00

Values: m = 839.30751 Exercise: add the equations to compute b = 6.5849737 q-hat (q-q-hat)^2 (q-q-bar)^2 the residual sums of squares between q mg/g values: values: and then the sum of these, which is the T 56.1041168 19.1745864 14.9780488 13.2994338 7.42428121 7.08855821 6.5849737 26.052009 354.3962 25.219994 9.6064898 9.3898193 35.934728 43.361879 503.96112
Sum=RSS Sum=TSS

After you have this column and its sum, compute r below. See if it agrees with the value in "Summary Output" below.

r = 1-RSS/TSS=
If it does NOT agree, check your cell entries for correctness. The cell entry for q-bar, the right-

SUMMARY OUTPUT Regression Statistics Multiple R 0.886283 R Square 0.785498 Adjusted R Square 0.742597 Standard Error10.03953 Observations 7 ANOVA df Regression Residual Total SS MS F Significance F 1 1845.483 1845.483 18.3097707 0.0078711 5 503.9611 100.7922 6 2349.444

most column and its sum for TSS, and the formula

for r. should all be the same as on the next sheet

When done, go to the next sheet.

Intercept X Variable 1

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% 6.584967 4.612779 1.427549 0.21277735 -5.27254 18.44247 -5.27254 18.44247 839.308 196.1462 4.278992 0.00787106 335.09896 1343.517 335.099 1343.517

70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0 0 0.05 C (mg/L) 0.1

add the equations to compute q-bar, al sums of squares between q and q-bar, he sum of these, which is the TSS.

have this column m, compute r ee if it agrees with in "Summary Output" below.

T agree, check your cell entries for The cell entry for q-bar, the right-

n and its sum for TSS, and the formula

d all be the same as on the next sheet.

ne, go to the next sheet.

Upper 95.0%

Example use of linear and nonlinear regression techniques to find best-fit model Why did we go to the trouble of using SOLVE to do a linear regression, when Excel has a built-in capability to do this? Linear regressions are easy, because there are explicit solutions for m, b, and r. For other cases, it's not so straightforward. But now that you see how a linear regression works, you can use the same approach-and SOLVER--for ANY equation.. If the equation is anything other than the equation of a line, it's known as NON-linear regression. The method even works for complex models that use more than one equation. KLc q = qmax So let's try try a Langmuir isotherm, 1+ K L c This is easy. Change the labels for the two fitting parameters to "qmax" and "KL." Rewrite the equation for "q-hat" to use these parameters as in the Langmuir equation. Use "SOLVE" exactly as before, minimizing the RSS by adjusting qmax and KL. You'll get the best values for these two parameters, and the r value. You can also solve by maximizing the cell with the r value - you'll get the same result. Once again: ALWAYS plot the data and the model on the same graph for visual examination! Values: qmax= 73.26604 KL= 42.60169 C mg/L 0.110 0.059 0.015 0.01 0.008 0.001 0.0006 0 C0-C = X mg/L 0.051 0.095 0.1 0.102 0.109 0.1094 q-bar: X/M = q mg/g 51.0 38.0 20.0 10.2 4.36 1.09 0.00 17.8 q-hat mg/g 52.41332 28.56506 21.88794 18.62306 2.99372 1.826078 0 (q-q-hat)^2 (q-q-bar)^2 values: values: 1.997466 89.0181 3.564314 70.94802 1.866722 0.535938 0 167.9306
Sum=RSS
q (mg/g)

(A smoother plot is constructed further down on this sheet)

70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0

Carbon dose mg/L 0 (control) 1 2.5 5 10 25 100

1101.728 407.7284 4.806117 57.87732 180.841 279.3482 317.1147 2349.444


Sum=TSS

r = 1-RSS/TSS=

For a smoother model curve, create a column with many more values for the x-axis, and compute the y-value for each of them: C q-hat 0 0 0.002 5.752391 70.0 0.004 10.66726 60.0 0.006 14.91509 50.0 0.008 18.62306 40.0 0.01 21.88794 0.012 24.78466 30.0 0.014 27.37218 20.0 0.016 29.6975 10.0 0.018 31.79854 0.0 0.02 33.70627 0 0.05 0.022 35.44618 C (mg/L) 0.024 37.03949 0.026 38.50398 0.028 39.85467
q (mg/g)

0.03 0.032 0.034 0.036 0.038 0.04 0.042 0.044 0.046 0.048 0.05 0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068 0.07 0.072 0.074 0.076 0.078 0.08 0.082 0.084 0.086 0.088 0.09 0.092 0.094 0.096 0.098 0.1

41.10432 42.26386 43.34269 44.34897 45.28978 46.17129 46.99895 47.77754 48.51131 49.204 49.85899 50.47927 51.06752 51.62616 52.15737 52.66313 53.14522 53.60527 54.04474 54.465 54.86728 55.2527 55.62231 55.97705 56.31781 56.6454 56.96056 57.264 57.55634 57.8382 58.11012 58.37262 58.62618 58.87126 59.10826 59.33758

the same approachne, it's known as

70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0 0 0.05 C (mg/L) 0.1

0.928523

0.1

Example use of linear and nonlinear regression techniques to find best-fit model

So that's it. In general, your steps for optimizing the fit of ANY equation or model to data goes like this: 1. Tabulate and graph the data. 2. Choose the equation you want to use, and place estimated values for the equation parameters in cells. 3. Add a column to your table with computed y values ("y-hat") using the x's, the equation, and the fitting parameters. 4. Add a column with computed squares of residuals (y - y-hat)^2, and sum these to get the residual sum of squares, RSS. 5. Add a cell to compute the average y (y-bar), and use it in a new column to compute (y - y-bar)^2. 6. Sum these to get the total sum of squares, TSS, and use it with the RSS to compute r. 7. Use SOLVE to find the parameter values that maximize r or minimize the RSS. This is why it's called a LEAST SQUARES m 8. You're not done until you PLOT the model equation along with the data for visual comparison.
If you do the above steps for more than one model equation (such as a comparison of the Langmuir and Freundlich equations), the model with the highest r is the best fit. (This is NOT the case if using the more conventional method of plotting linearized equations such as reciprocal or log-log plots to get the fitting parameters, because this method gives r values for the linearized plot, not the original plot, so they're not comparisons on the same basis.)

Below is a fit for the Freundlich isotherm, and a plot showing it and the Langmuir. The Langmuir is a better fit. 1/n= 0.535729 KF= 243.3178 C0-C = X X/M = q Carbon dose C q-hat (q-q-hat)^2 (q-q-bar)^2 mg/L mg/L mg/L mg/g mg/g values: values: 0 (control) 0.110 1 0.059 0.051 51.0 53.41742805 5.843958 1101.728 2.5 0.015 0.095 38.0 25.64791705 152.574 407.7284 5 0.01 0.1 20.0 20.64024369 0.409912 4.806117 10 0.008 0.102 10.2 18.31459287 65.84662 57.87732 25 0.001 0.109 4.36 6.011534347 2.727566 180.841 100 0.0006 0.1094 1.09 4.572296821 12.09855 279.3482 0 0.00 0 0 317.1147 q-bar: 17.8 239.5006 2349.444
Sum=RSS Sum=TSS

70.0 60.0 q (mg/g) 50.0 40.0 30.0 20.0 10.0 0.0

r = 1-RSS/TSS=

C 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 0.022 0.024

q-hat 0 8.714770973 12.63358549 15.69870771 18.31459287 20.64024369 22.75802388 24.71723679 26.55020717 28.27950666 29.92164499 31.48913623 32.99173796

Smoother model curve:

70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0 0 0.05

q (mg/g)

Data Langmuir Freundlich

C (mg/L)

C (mg/L)
0.026 0.028 0.03 0.032 0.034 0.036 0.038 0.04 0.042 0.044 0.046 0.048 0.05 0.052 0.054 0.056 0.058 0.06 0.062 0.064 0.066 0.068 0.07 0.072 0.074 0.076 0.078 0.08 0.082 0.084 0.086 0.088 0.09 0.092 0.094 0.096 0.098 0.1 34.43723542 35.83195991 37.1811438 38.48917121 39.75975947 40.99609343 42.20092688 43.37666028 44.52540142 45.64901314 46.74915164 47.82729728 48.88477975 49.92279879 50.94244136 51.94469599 52.93046488 53.90057413 54.85578248 55.79678875 56.72423834 57.63872876 58.54081448 59.43101118 60.30979946 61.17762809 62.03491692 62.8820594 63.7194249 64.54736069 65.36619376 66.17623249 66.97776805 67.77107576 68.55641624 69.33403652 70.10417102 70.86704241

he fitting parameters. dual sum of squares, RSS.

's called a LEAST SQUARES method).

70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0 0 0.05 C (mg/L) 0.1

0.898061

Langmuir Freundlich

0.1

The "Langmuir-Freundlich" isotherm has the form

q = q max

K L c n n 1 + K L c

For the data, evaluate this model.

You might also like