Professional Documents
Culture Documents
Stella M. Clarke
Department of Industrial & Manufacturing
Engineering,
Regression for Approximation of
The Pennsylvania State University,
University Park, PA 16802 Complex Engineering Analyses
Jan H. Griebsch A variety of metamodeling techniques have been developed in the past decade to reduce
Doctoral Candidate the computational expense of computer-based analysis and simulation codes. Metamod-
Lehrstuhl fr Effiziente Algorithmen, eling is the process of building a model of a model to provide a fast surrogate for a
The Technical University of Munich, computationally expensive computer code. Common metamodeling techniques include
Munich, Germany response surface methodology, kriging, radial basis functions, and multivariate adaptive
regression splines. In this paper, we investigate support vector regression (SVR) as an
Timothy W. Simpson1 alternative technique for approximating complex engineering analyses. The computation-
e-mail: tws8@psu.edu ally efficient theory behind SVR is reviewed, and SVR approximations are compared
Departments of Mechanical & Nuclear against the aforementioned four metamodeling techniques using a test bed of 26 engi-
and Industrial & Manufacturing Engineering, neering analysis functions. SVR achieves more accurate and more robust function ap-
The Pennsylvania State University, proximations than the four metamodeling techniques, and shows great potential for meta-
University Park, PA 16802 modeling applications, adding to the growing body of promising empirical performance
of SVR. DOI: 10.1115/1.1897403
Journal of Mechanical Design Copyright 2005 by ASME NOVEMBER 2005, Vol. 127 / 1077
2.3 Kriging. The kriging model postulates a combination of a A key assumption in this formulation is that there exists a func-
known function and departures of the form tion fx that can approximate all input pairs xi , y i with preci-
sion; however, this may not be the case or perhaps some error
yx = fx + Zx 7
allowance is desired. Thus, the slack variables, i and *i , can be
where yx is the unknown function of interest, fx is a known incorporated into the optimization problem to yield the following
polynomial function, which is often taken as a constant, and Zx formulation 10:
is the correlation function, which is a realization of a stochastic
process with mean zero, variance 2, and nonzero covariance 1 2
Minimize w + C i + i*
26. Flexibility in kriging is achieved through a variety of spatial 2 i=1
correlation functions 27, and in this work, we use a Gaussian
correlation function 26 and the fitting procedures described in y i w xi b + i
Ref. 28. Comparisons between kriging and response surfaces
can be found in Refs. 2,16,28, and a recent investigation of subject to w xi + b y i + i* 10
kriging models to support conceptual design can be found in Ref. i, i* 0
29.
where the constant C 0 determines the tradeoff between flatness
2.4 Multivariate Adaptive Regression Splines. Multivariate small w and the degree to which deviations larger than are
adaptive regression splines MARS is a nonparametric regression tolerated see Fig. 1, and is the number of samples. This is
procedure that makes no assumption about the underlying func- referred to as the -insensitive loss function proposed by Vapnik
+ y + w x + b
1
L = w2 + C i + i* i i i i
2 i=1 i=1
i=1
i* + i* + y i w xi b +
i=1
i i
* *
i i 11
bL =
i=1
*
i i = 0 12
nonlinear transformation on the input vectors. This transformation
is referred to as the kernel function and is represented by kx , x,
wL = w
i=1
*
i ixi = 0 13 where x and x are each input vectors. Table 1 lists common
kernel functions where the kernel function substitution maintains
the elegance of the optimization method used for linear SVR 34.
Applying the kernel function to the dot product of input vec-
iL = C i i = 0 14
tors, we obtain
*L = C i* i* = 0 15
1
i i i* j *j kxi,x j
2 i,j=1
Substituting Eqs. 1215 into Eq. 11 yields the optimization Maximize
problem in dual form
+ + y * *
i i i i i
1 i=1 i=1
i i* j *j xi x j
2 i,j=1
Maximize
= 0 i
*
i
i + i* + y ii i* Subject to i=1 19
i=1 i=1 i, i* 0,C
Replacing the dot product in Eq. 18, the SVR approximation
subject to
i=1
i i* =0
16
becomes
i i* 0,C fx = kx ,x + b
i=1
i
*
i i 20
From Eq. 13,
The kernel function kxi , x can be preprocessed, and the results
are stored in the kernel matrix, K = kxi , xji,j=1
n
. The kernel ma-
w=
i=1
*
i ixi 17 trix must be positive definite in order to guarantee a unique opti-
mal solution to the quadratic optimization problem. The kernel
and so the linear regression in Eq. 8 becomes functions presented in Table 1 yield positive definite kernel ma-
trices 4. Thus, by using the kernel function and corresponding
kernel matrix, nonlinear function approximations can be achieved
fx = x x + b
i=1
i
*
i i 18 with SVR while maintaining the simplicity and computational ef-
ficiency of linear SVR approximations.
Now that the theory behind SVR has been reviewed, we can
Thus, the training algorithm and the regression function fx can begin to investigate the application of SVR to engineering analy-
be expressed in terms of the dot product xi x. ses. First, a simple one-dimensional example is presented next for
Transforming the optimization problem into dual form yields illustration purposes, and then a test bed of more realistic engi-
two advantages. First, the optimization problem is now a qua- neering problems is discussed in Section 4.
dratic programming problem with linear constraints and a positive
definite Hessian matrix, ensuring a unique global optimum. For 3.3 A Simple One-Dimensional Example. This section illus-
such problems, highly efficient and thoroughly tested quadratic trates the application of SVR to a one-dimensional example,
solvers exist. Second, as can be seen in Eq. 16, the input vectors which means that the length of each input vector is only one. The
only appear inside the dot product. The dot product of each pair of function to be approximated comes from Su and Renaud 35 and
input vectors is a scalar and can be preprocessed and stored in the is shown in Fig. 2. The five training points noted in the figure are
quadratic matrix M ij = xix jij. In this way, the dimensionality of used to fit the SVR.
the input space is hidden from the remaining computations, pro- The eighth-order function shown in Fig. 2 is given by
viding means for addressing the curse of dimensionality 13. 9
1.0000 0.8749 0.5858 0.3003 0.1178
0.8749 1.0000 0.8749 0.5858 0.3003
K = 0.5858 0.8749 1.0000 0.8749 0.5858
0.3003 0.5858 0.8749 1.0000 0.8749
0.1178 0.3003 0.5858 0.8749 1.0000
Fig. 3 Flowchart of the SVR algorithm
This matrix is used in the quadratic optimization problem stated
in Eq. 19. The resulting solution yields all the variables required
to calculate the approximating function. The vector of differences
of Lagrange multipliers is
433.8411
924.0233
i* i = 999.9999
647.1798
229.4055
The difference of Lagrange multipliers is then used with the
training points to calculate the weight vector, w = i=1 *i ixi,
and the offset b. The offset b is calculated using Karush-Kuhn-
Tucker conditions see Ref. 36 for more details, and all of these
variables are then substituted into Eq. 20 to yield the SVR
approximation.
The default value of 1 104 was used for , and the radius of
the Gaussian kernel was manually optimized to a value of 9.7 for
the given training data. This involved continually updating the
Gaussian radius and running the SVR algorithm until the root
mean square error RMSE was close to minimum. Figure 4
shows the SVR function approximation compared to the actual Fig. 4 Fit of one-dimensional function using SVR
function. We can see a very close fit between the SVR approxi- Step 2Construct metamodels using training data. Every
mation and the actual function within the range of the training training data set is submitted to each metamodeling algorithm to
data, but the fit starts to deteriorate outside of this range, as one generate a function approximation. For SVR, Gunns 13 support
might expect. vector machine Matlab code is implemented as in the one-
To assess the accuracy of these results, the three error equations dimensional example see Fig. 3. The Gaussian kernel function
in Table 2 are calculated, where nerror is the number of random test was selected due to its good features and strong learning capa-
points used, y i is the actual function value and y i is the predicted bility 47, and the radius of the Gaussian kernel function is
value from the function approximation method. For comparison, manually optimized for each function for each data set. The de-
the response surface and kriging models constructed in Ref. 37 fault value of 1 104 was used for as we found that this pa-
are included in this assessment. The resulting errors for SVR, the rameter had little impact on the resulting SVR model. For kriging,
second-order RSM model, and the kriging model are listed in RSM, RBF, and MARS, Fortran algorithms from Ref. 37 were
Table 3; these errors are based on 16 evenly spaced test points utilized.
between x = 920 and x = 945. The SVR method has achieved the Step 3Compare the accuracy and robustness of each
lowest error values in all three categories; hence, SVR provides a metamodel. Additional test data is used to test the accuracy of the
very accurate approximation for this one-dimensional example. function approximation generated by each of the five metamodel-
ing techniques. For a given function and a given metamodel, the
inputs of the test data are submitted to the function approximation
4 Approximation Test Bed and Comparison generated in Step 2. The outputs are the predicted values of the
The test bed of engineering analyses used to benchmark SVR original function according to the corresponding metamodeling
against other approximation methods is listed in the Appendix. technique. The difference between each predicted value y i and the
This test bed was initially created in Ref. 37 to compare the actual function value y i is calculated as the error for that test point.
predictive capability of kriging models. The 26 functions to be As in the one-dimensional example, the three error measures
approximated are derived from six engineering problems typically listed in Table 2 are calculated to assess accuracy: i RMSE, ii
used to test optimization algorithms: a two-bar truss 38, a three- maximum error, and iii average error. The RMSE gives an indi-
bar truss 38, a two-member frame 39, a helical compression cation of how accurate the approximation was overall, while the
spring 40, a welded beam 41, and a pressure vessel 40. maximum error can reveal the presence of regional areas of poor
approximation; however, due to the different magnitudes of the 26
4.1 Training and Testing Procedure. The procedure used to test functions, these error statistics enable comparisons of algo-
train and test each approximation technique is summarized as rithms only within each function. It is desirable to compare the
follows. effectiveness of each algorithm across all of the functions so that
Step 1Generate training data and test data. To minimize conclusions can be drawn; however, normalizing the errors
interactions between metamodel type and the choice of experi- against their nominal values does not provide a realistic compari-
mental design, six different sets of training data were generated son since many nominal values are close to zero, which exacer-
for each function using six different experimental designs as listed bates the relative value of the error. A common normalized statis-
in Table 4. A review and detailed comparison of these experimen- tic is the correlation coefficient R2, but it is only suited for linear
tal designs can be found in Ref. 42. The number of training approximations 48. Although correlation coefficients have been
points generated for each problem n1 is also shown. Since there devised for nonlinear approximations see Ref. 48, they apply
are six sets of training data for each of the 26 functions, a total of only to simple functions and not to the complex equations in the
6 26= 156 function approximations are constructed using each test bed. Thus, to enable comparisons of SVR to the four other
approximation technique. metamodeling techniques, the average percentage differences in
Additional test data is generated as shown in Table 4 using a error values between each algorithm and SVR is computed. In this
random Latin hypercube design 4346. The number of test manner, the error in the SVR approximations becomes the bench-
points is very large to ensure a thorough analysis of the accuracy mark to which all other metamodels are compared. Positive per-
of the resulting approximation throughout the performance space. centage values imply that the corresponding approximation had
larger errors than SVR, while negative values imply that the ap-
proximation had smaller errors. These results are presented and
Table 3 Error comparisons between approximation methods discussed in Section 4.2.1.
for one-dimensional example The robustness of an approximation method is indicated by the
variance between its error values across different sample sets 2.
The capability to continually repeat a function approximation with
similar accuracies i.e., similar errors increases the reliability of
the results and the robustness of the approximation method. To
test for robustness, SVR is again used as a benchmark to examine
the robustness of each approximation technique, and the standard
deviation is computed to indicate robustness using the following pared to the four other approximation techniques. As previously
steps: mentioned, lower overall RMSE values represent good global
function approximation across the performance space, while lower
1. For a given approximation technique, find the standard de- overall maximum error values reflect the absence of poorly ap-
viation of each error i.e., RMSE, MAE, AAE across the six proximated regions in the performance space. Thus, the results
different training sets for each function. imply that kriging models avoid areas of poor approximation, but
2. Normalize each standard deviation against the standard de- they do not perform as well globally as the SVR approximations.
viation for SVR for the same function. This normalized stan- In general, SVR has outperformed the four other approximation
dard deviation NormStdDev is calculated as: techniques, giving lower overall error values. Kriging achieved
Norm Std Dev the next best overall performance, followed by MARS, RSM, and
finally RBF. These trends are consistent with those observed in
Std. Dev. for Given Technique Std. Dev. for SVR Ref. 37, but we note that RBF has performed much better in
=
Std. Dev. for SVR other studies 2,25; we are still investigating why this has
22 occurred.
To further compare the performance of each algorithm accord-
3. Average the NormStdDev across all functions for a given ing to the type of function being modeled, the three error statistics
metamodel. for linear and nonlinear function approximations were considered
separately. Table 5 shows the number of times each metamodel
The normalized standard deviation reflects the variance in the er- type performed the best i.e., achieved the lowest corresponding
ror for a given approximation technique, relative to SVR. A posi- error statistic for nonlinear and linear function approximations.
tive value indicates a greater variance than SVR and hence less SVR achieved the highest frequency of best performances for the
robustness, while a negative value indicates a lower variance than nonlinear functions by a significant margin, whereas kriging out-
SVR and a more robust approximation method. These results are performs more often for the linear functions by a smaller margin.
presented and discussed in Section 4.2.2. These results are further confirmed in Figs. 6 and 7, which show
Step 4Compare the computational efficiency and trans-
parency of each metamodeling technique. Following Jin et al.
2, the computational efficiency refers to the computational effort
required to construct the metamodel and predict response values
for a set of new points using the metamodel, and the transparency Table 5 Frequency in which each metamodel performed best
for each error measure
is the capability of providing information concerning contribu-
tions of different variables and interactions among variables. The
five metamodeling techniques are compared on these two aspects
in Section 4.2.3.
4.2 Analysis of Results.
4.2.1 Accuracy Results. Figure 5 shows the percentages by
which average error values were higher than the corresponding
error value for SVR. These percentage differences have been av-
eraged over all 26 approximated functions. Except for the overall
maximum error for kriging, SVR has achieved lower average er-
ror values i.e., RMSE, average error, and maximum error com-
the comparison of errors relative to SVR for the linear and non- errors were averaged. Further investigation is needed to under-
linear functions, respectively. stand why RSM did not perform as well as expected for the linear
This separation into linear and nonlinear functions reveals the function approximations.
stronger performance of kriging in approximating linear functions,
especially in the maximum error category, indicating a good gen- 4.2.2 Robustness Results. Figure 8 presents the normalized
eral performance over the entire input space. However, the perfor- standard deviations of the errors for each approximation tech-
mance of RSM for linear functions is not a significant improve- nique, relative to SVR. Hence, the results indicate that SVR is the
ment over its performance with nonlinear functions, as would be most robust of the five approximation techniques, reinforcing the
expected. It has been noticed that the errors produced by RSM for validity of the error values obtained in Fig. 5. Kriging is the sec-
the linear functions within the spring test bed problem were par- ond most robust metamodeling technique. The variance between
ticularly large, which greatly influenced this poor result when the errors for RSM is very large, when measured relative to SVR,
indicating very inconsistent performance. This could be reflective transparent. The kriging models are the least transparent; however,
of the fact that RSM is most suitable for linear approximations the theta values can be interpreted with some practice 2,27,49.
only, performing well in these cases and poorly in nonlinear cases.
The relatively high robustness of SVR is followed by kriging,
MARS, RBF, and finally RSM. 5 Conclusions and Future Work
4.2.3 Computational Efficiency and Transparency. We end our In conclusion, the theory behind SVR has been presented and
analysis with a comparison of the computational efficiency and shown to possess the desirable qualities of mathematical and al-
transparency of the five metamodeling techniques. The time to fit gorithmic elegance, producing an actual approximating function
an SVR approximation was on the order of a few seconds using as opposed to a trained black box. In comparison to four common
Gunns 13 Matlab code running on a 1.8 GHz desktop PC, even approximating techniques, SVR had the best overall performance
with the optimization process, and prediction with a fitted SVR for the test bed of 26 engineering analysis functions. Only kriging
approximation took less than a second. The fitting and prediction outperformed SVR in the category of average lowest maximum
times for SVR were very comparable to that for MARS, while error. The strong performance of the SVR approximations was
RSM and RBF were much faster, taking less than a second each. reinforced through relatively small variances between error val-
The least computationally efficient were the kriging models, ues, indicating that SVR also yields a more robust approximation.
which could take several minutes to fit depending on the dimen- SVR provides a good compromise between prediction accuracy
sionality of the problem and number of samplesfortunately, pre- and robustness of a kriging model, with the computational effi-
diction with the kriging models took less than a second each. We ciency and transparency near that of a RSM or RBF approxima-
refer the reader to Ref. 2 for a more detailed investigation of the tion. These results add to the growing body of promising empiri-
effect of problem size and sample size on the computational effi- cal performance of SVR, and we are currently investigating the
ciency of kriging, RSM, RBF, and MARS. theoretical foundations of all five metamodeling techniques in
RSM provides the most transparency in terms of the function more detail to better understand where SVR obtains its strength.
relationship and the factor contributions, see Eqs. 4 and 5. The SVR implementation employed produced successful re-
MARS provides some transparency since the MARS models can sults; however, better results using SVR are anticipated through
be recast into the form 30 increased attention to the SVR algorithm, itself, and the model
parameters selected. The Matlab algorithm with which the SVR
results were obtained is not as efficient as other available algo-
y = a0 + f x + f
i i ij xi,x j + f ijkxi,x j ,xk + rithms, such as SVMlight and mySVM http://www-ai.cs.uni-
Km=1 Km=2 Km=3 dortmund.de/, which are available in C and more efficient than
23 the relatively slower Matlab implementation; however, users are
limited in the amount of tweaking they can do to the software. A
where the first sum is over all basis functions that involve only a better solution would be to code an entire SVR algorithm in C and
single variable; the second is over all basis functions that involve employ any of a number of commercially available quadratic
two variables, representing two-variable interactions if present; solvers to give users more flexibility in fitting and using SVR.
the third is over three variables, etc. RBF and SVR have explicit In addition, the SVR implemented in our experiments consis-
function equations, Eqs. 6 and 20, respectively, but the indi- tently used a Gaussian kernel function that was manually opti-
vidual factor contributions are not clear, making them even less mized for each set of training data. Automatically optimizing the