You are on page 1of 11

Analysis of Support Vector

Stella M. Clarke
Department of Industrial & Manufacturing
Engineering,
Regression for Approximation of
The Pennsylvania State University,
University Park, PA 16802 Complex Engineering Analyses
Jan H. Griebsch A variety of metamodeling techniques have been developed in the past decade to reduce
Doctoral Candidate the computational expense of computer-based analysis and simulation codes. Metamod-
Lehrstuhl fr Effiziente Algorithmen, eling is the process of building a model of a model to provide a fast surrogate for a
The Technical University of Munich, computationally expensive computer code. Common metamodeling techniques include
Munich, Germany response surface methodology, kriging, radial basis functions, and multivariate adaptive
regression splines. In this paper, we investigate support vector regression (SVR) as an
Timothy W. Simpson1 alternative technique for approximating complex engineering analyses. The computation-
e-mail: tws8@psu.edu ally efficient theory behind SVR is reviewed, and SVR approximations are compared
Departments of Mechanical & Nuclear against the aforementioned four metamodeling techniques using a test bed of 26 engi-
and Industrial & Manufacturing Engineering, neering analysis functions. SVR achieves more accurate and more robust function ap-
The Pennsylvania State University, proximations than the four metamodeling techniques, and shows great potential for meta-
University Park, PA 16802 modeling applications, adding to the growing body of promising empirical performance
of SVR. DOI: 10.1115/1.1897403

1 Introduction In this paper, support vector regression SVR is investigated as


an alternative technique for approximating complex engineering
Much of todays engineering analysis requires running complex
analyses. SVR is a particular implementation of support vector
and computationally expensive analysis and simulation codes,
machines SVM, a principled and very powerful method that in
such as finite element analysis and computational fluid dynamics
the few years since its introduction has already outperformed most
models. Despite continuing increases in computer processor
other systems in a wide variety of applications 4. In many
speeds and capabilities, the huge time and computational costs of
running complex engineering codes maintains pace. A way to applications, SVMs are known to produce equally good, if not
overcome this problem is to generate an approximation of the better, results than neural networks, while being computationally
complex analysis code that describes the process accurately cheaper and producing an actual mathematical function i.e., no
enough, but at a much lower cost. Such approximations are often black box. Hearst 5 comments that SV learning is based on
called metamodels in that they provide a model of the model some beautifully simple ideas and provides a clear intuition of
1. Mathematically, if the inputs to the actual computer analysis what learning from examples is about and that SVMs can lead
are supplied in vector x, and the outputs from the analysis in to high performances in practical applications. In addition,
vector y, then the true computer analysis code evaluates Takeuchi and Collier 6 state that SVMs have been applied very
successfully in the past to several traditional classification tasks
y = fx 1 such as text classification. Other successful applications of
SVMs have included handwritten character and digit recognition,
where fx is a complex engineering analysis function. The com- face detection, text categorization, and object detection in ma-
putationally efficient metamodel approximation is chine vision 7; SVR has also been used to estimate minimum
y = gx 2 zone straightness and flatness tolerances 8.
The SVM algorithm is a nonlinear generalization of the gener-
such that alized portrait algorithm developed in Russia in the 1960s 9. In
its present form, SVM was developed at AT&T Bell Laboratories
y = y + 3
by Vapnik and co-workers in the early 1990s 10. Smola and
where includes both approximation and random errors. Schlkopf 11 acknowledge the success of SVMs since this time
There currently exist a number of metamodeling techniques to and also add that in regression and time series prediction appli-
approximate fx with gx, such as polynomial response surface cations, excellent performances were soon obtained. The appli-
models, multivariate adaptive regression splines, radial basis func- cation of the support vector approach to regression retains much
tions, kriging, and neural networks, and a recent comparison of of the elegance of SVMs, but adds the capability to approximate
the first four of these metamodeling techniques can be found in linear and nonlinear functions. The resulting support vector re-
Ref. 2. All of these techniques are capable of function approxi- gression SVR is showing promising empirical performance
mation, but they vary in their accuracy, robustness, computational 12,13, and in this paper we seek to provide additional empirical
efficiency, and transparency. For example, although neural net- evidence by investigating the performance of SVR in comparison
works are able to approximate very complex models well, they to four metamodeling techniques commonly used in engineering
have the two disadvantages of i being a black box approach design: kriging, response surfaces, radial basis functions, and mul-
and ii having a computationally expensive training process 3. tivariate adaptive regression splines.
Black box means that little can be seen and understood about the The remainder of this paper is as follows. An overview of the
model, because an exact function is not generated, only a trained four metamodeling techniques is given next. In Section 3, support
box that accepts inputs and returns outputs. vector regression is discussed, beginning with linear function ap-
proximations and then nonlinear approximations. A simple one-
1
dimensional function approximation is presented for illustration
Corresponding author.
Contributed by the Design and Automation Committee for publication in the
purposes in Section 3.3. A more thorough analysis is documented
JOURNAL OF MECHANICAL DESIGN. Manuscript received October 7, 2003; revised in Section 4, wherein a test bed of 26 different engineering analy-
August 13, 2004. Associate Editor: Wei Chen. ses is used to compare the five metamodeling techniques, includ-

Journal of Mechanical Design Copyright 2005 by ASME NOVEMBER 2005, Vol. 127 / 1077

Downloaded From: http://mechanicaldesign.asmedigitalcollection.asme.org/ on 05/26/2015 Terms of Use: http://asme.org/terms


ing SVR. In Section 4, the test bed is introduced, and the method
for comparing the techniques is explained, followed by a discus-
sion of the results. Section 5 contains closing remarks, and out-
lines future research in using SVR for function approximation.

2 Existing Metamodeling Techniques


This section briefly presents an overview of the four metamod-
eling techniques against which SVR is compared: i response
surface methodology, ii radial basis functions, iii kriging, and
iv multivariate adaptive regression splines. These techniques
were chosen based on their widespread use in engineering design
14 and availability of algorithms.
2.1 Response Surface Methodology. Response surface
methodology RSM approximates functions by using the least
squares method on a series of points in the design variable space. Fig. 1 Accounting for slack variables
Low-order polynomials are the most widely used response surface
approximating functions 14. Equation 4 is a first-order polyno-
mial that can be used for approximating functions with little to no tional relationship between the dependent and independent vari-
curvature. Equation 5 is a second-order polynomial that includes ables; instead, MARS constructs this relation from a set of coef-
quadratic terms and all two-factor interactions. ficients and basis functions that are determined from regression
data 30. The input space is divided into regions containing their
k
own regression equation; thus, MARS is suitable for problems
y = b0 + bx
i=1
i i 4 with high input dimensions, where the curse of dimensionality
would likely create problems for other techniques. For this work,
k k
we use the MARS algorithm developed by Chen et al. 31, Chen
32, and her colleagues.
y = b0 +
i=1
b ix i + b x +b xx
i=1
2
ii i
ij
ij i j 5
3 Support Vector Regression
The constants b0 , bi , bii , bij are determined by least squares re-
3.1 Linear Regression Using SVR. There are two basic aims
gression; more information can be found in Ref. 15. Response
in SVR. The first is to find a function fx that has at most
surfaces have become popular in mechanical design, having been
used in a variety of applications including compliant mechanism deviations from each of the targets of the training inputs. For the
design 16, robust design 17, multidisciplinary optimization linear case, f is given by
18, adaptive strategies for global optimization 19, manufactur- fx = w x + b 8
ing analysis 20, and game theoretic approaches to collaborative
design 21. where a b is the dot product between a and b.
At the same time, we would like this function to be as flat as
2.2 Radial Basis Functions. Radial basis functions RBF at- possible. Smola et al. 33 showed that choosing the flattest func-
tempt approximation by using a linear combination of radially tion in the feature space leads to a smooth function in the input
symmetric functions 22,23. Mathematically, the model can be space. Flatness in this sense means a small w in Eq. 8. This
expressed as second aim is not as immediately intuitive as the first, but is
nevertheless important in the formulation of the optimization
y = a X X
i
i 0i 6 problem used to construct the SVR approximation
1 2
where ai is a real-valued weight, and X0i is the input vector. Ra- Minimize w
dial basis functions have produced good approximations to arbi- 2
trary contours. For example, they have been successfully applied
to electronic circuit simulation models 24 and the construction
of metamodels for a desk lamp example 25.
subject to y i w xi b
w xi + b y i
9

2.3 Kriging. The kriging model postulates a combination of a A key assumption in this formulation is that there exists a func-
known function and departures of the form tion fx that can approximate all input pairs xi , y i with preci-
sion; however, this may not be the case or perhaps some error
yx = fx + Zx 7
allowance is desired. Thus, the slack variables, i and *i , can be
where yx is the unknown function of interest, fx is a known incorporated into the optimization problem to yield the following
polynomial function, which is often taken as a constant, and Zx formulation 10:
is the correlation function, which is a realization of a stochastic


process with mean zero, variance 2, and nonzero covariance 1 2
Minimize w + C i + i*
26. Flexibility in kriging is achieved through a variety of spatial 2 i=1
correlation functions 27, and in this work, we use a Gaussian


correlation function 26 and the fitting procedures described in y i w xi b + i
Ref. 28. Comparisons between kriging and response surfaces
can be found in Refs. 2,16,28, and a recent investigation of subject to w xi + b y i + i* 10
kriging models to support conceptual design can be found in Ref. i, i* 0
29.
where the constant C 0 determines the tradeoff between flatness
2.4 Multivariate Adaptive Regression Splines. Multivariate small w and the degree to which deviations larger than are
adaptive regression splines MARS is a nonparametric regression tolerated see Fig. 1, and is the number of samples. This is
procedure that makes no assumption about the underlying func- referred to as the -insensitive loss function proposed by Vapnik

1078 / Vol. 127, NOVEMBER 2005 Transactions of the ASME

Downloaded From: http://mechanicaldesign.asmedigitalcollection.asme.org/ on 05/26/2015 Terms of Use: http://asme.org/terms


et al. 12, which enables a sparse set of support vectors to be Table 1 Common kernel functions 34
obtained for regression.
The optimization function and linear constraints in Eq. 10 can
be written as the Lagrangian function

+ y + w x + b
1
L = w2 + C i + i* i i i i
2 i=1 i=1


i=1
i* + i* + y i w xi b +
i=1
i i
* *
i i 11

Through Lagrangian theory, necessary conditions for to be a


solution to the original optimization problem are

bL =
i=1
*
i i = 0 12
nonlinear transformation on the input vectors. This transformation
is referred to as the kernel function and is represented by kx , x,
wL = w
i=1
*
i ixi = 0 13 where x and x are each input vectors. Table 1 lists common
kernel functions where the kernel function substitution maintains
the elegance of the optimization method used for linear SVR 34.
Applying the kernel function to the dot product of input vec-
iL = C i i = 0 14
tors, we obtain



*L = C i* i* = 0 15

1
i i i* j *j kxi,x j
2 i,j=1
Substituting Eqs. 1215 into Eq. 11 yields the optimization Maximize
problem in dual form
+ + y * *


i i i i i


1 i=1 i=1
i i* j *j xi x j


2 i,j=1
Maximize
= 0 i
*


i
i + i* + y ii i* Subject to i=1 19
i=1 i=1 i, i* 0,C
Replacing the dot product in Eq. 18, the SVR approximation

subject to

i=1
i i* =0
16
becomes

i i* 0,C fx = kx ,x + b
i=1
i
*
i i 20
From Eq. 13,
The kernel function kxi , x can be preprocessed, and the results

are stored in the kernel matrix, K = kxi , xji,j=1
n
. The kernel ma-
w=
i=1
*
i ixi 17 trix must be positive definite in order to guarantee a unique opti-
mal solution to the quadratic optimization problem. The kernel
and so the linear regression in Eq. 8 becomes functions presented in Table 1 yield positive definite kernel ma-
trices 4. Thus, by using the kernel function and corresponding
kernel matrix, nonlinear function approximations can be achieved
fx = x x + b
i=1
i
*
i i 18 with SVR while maintaining the simplicity and computational ef-
ficiency of linear SVR approximations.
Now that the theory behind SVR has been reviewed, we can
Thus, the training algorithm and the regression function fx can begin to investigate the application of SVR to engineering analy-
be expressed in terms of the dot product xi x. ses. First, a simple one-dimensional example is presented next for
Transforming the optimization problem into dual form yields illustration purposes, and then a test bed of more realistic engi-
two advantages. First, the optimization problem is now a qua- neering problems is discussed in Section 4.
dratic programming problem with linear constraints and a positive
definite Hessian matrix, ensuring a unique global optimum. For 3.3 A Simple One-Dimensional Example. This section illus-
such problems, highly efficient and thoroughly tested quadratic trates the application of SVR to a one-dimensional example,
solvers exist. Second, as can be seen in Eq. 16, the input vectors which means that the length of each input vector is only one. The
only appear inside the dot product. The dot product of each pair of function to be approximated comes from Su and Renaud 35 and
input vectors is a scalar and can be preprocessed and stored in the is shown in Fig. 2. The five training points noted in the figure are
quadratic matrix M ij = xix jij. In this way, the dimensionality of used to fit the SVR.
the input space is hidden from the remaining computations, pro- The eighth-order function shown in Fig. 2 is given by
viding means for addressing the curse of dimensionality 13. 9

3.2 Nonlinear Regression Using SVR. Another benefit of the fx = a x 900


i=1
i i
i1
21
dual form is that nonlinear function approximations can be
achieved by replacing the dot product of input vectors with a where

Journal of Mechanical Design NOVEMBER 2005, Vol. 127 / 1079

Downloaded From: http://mechanicaldesign.asmedigitalcollection.asme.org/ on 05/26/2015 Terms of Use: http://asme.org/terms


Fig. 2 Eighth-order one-dimensional function

a1 = 659.23, a2 = 190.22, a3 = 17.802, a4 = 0.82691

a5 = 0.021885, a6 = 0.0003463, a7 = 3.2446 106

a8 = 1.6606 108, a9 = 3.5757 1011

The Matlab code developed by Gunn 13 was used to execute the


SVR algorithm. Figure 3 shows a flowchart of the implementa-
tion, which is discussed as follows.
As seen in Fig. 3, the kernel matrix is first calculated from the
training points. Using the Gaussian kernel function presented in
Table 1 for each combination of input vectors, the following ker-
nel matrix is obtained:


1.0000 0.8749 0.5858 0.3003 0.1178
0.8749 1.0000 0.8749 0.5858 0.3003
K = 0.5858 0.8749 1.0000 0.8749 0.5858
0.3003 0.5858 0.8749 1.0000 0.8749
0.1178 0.3003 0.5858 0.8749 1.0000
Fig. 3 Flowchart of the SVR algorithm
This matrix is used in the quadratic optimization problem stated
in Eq. 19. The resulting solution yields all the variables required
to calculate the approximating function. The vector of differences
of Lagrange multipliers is


433.8411
924.0233
i* i = 999.9999
647.1798
229.4055
The difference of Lagrange multipliers is then used with the

training points to calculate the weight vector, w = i=1 *i ixi,
and the offset b. The offset b is calculated using Karush-Kuhn-
Tucker conditions see Ref. 36 for more details, and all of these
variables are then substituted into Eq. 20 to yield the SVR
approximation.
The default value of 1 104 was used for , and the radius of
the Gaussian kernel was manually optimized to a value of 9.7 for
the given training data. This involved continually updating the
Gaussian radius and running the SVR algorithm until the root
mean square error RMSE was close to minimum. Figure 4
shows the SVR function approximation compared to the actual Fig. 4 Fit of one-dimensional function using SVR

1080 / Vol. 127, NOVEMBER 2005 Transactions of the ASME

Downloaded From: http://mechanicaldesign.asmedigitalcollection.asme.org/ on 05/26/2015 Terms of Use: http://asme.org/terms


Table 2 Error measures for accuracy assessment Table 4 Generation of training data and test data

function. We can see a very close fit between the SVR approxi- Step 2Construct metamodels using training data. Every
mation and the actual function within the range of the training training data set is submitted to each metamodeling algorithm to
data, but the fit starts to deteriorate outside of this range, as one generate a function approximation. For SVR, Gunns 13 support
might expect. vector machine Matlab code is implemented as in the one-
To assess the accuracy of these results, the three error equations dimensional example see Fig. 3. The Gaussian kernel function
in Table 2 are calculated, where nerror is the number of random test was selected due to its good features and strong learning capa-
points used, y i is the actual function value and y i is the predicted bility 47, and the radius of the Gaussian kernel function is
value from the function approximation method. For comparison, manually optimized for each function for each data set. The de-
the response surface and kriging models constructed in Ref. 37 fault value of 1 104 was used for as we found that this pa-
are included in this assessment. The resulting errors for SVR, the rameter had little impact on the resulting SVR model. For kriging,
second-order RSM model, and the kriging model are listed in RSM, RBF, and MARS, Fortran algorithms from Ref. 37 were
Table 3; these errors are based on 16 evenly spaced test points utilized.
between x = 920 and x = 945. The SVR method has achieved the Step 3Compare the accuracy and robustness of each
lowest error values in all three categories; hence, SVR provides a metamodel. Additional test data is used to test the accuracy of the
very accurate approximation for this one-dimensional example. function approximation generated by each of the five metamodel-
ing techniques. For a given function and a given metamodel, the
inputs of the test data are submitted to the function approximation
4 Approximation Test Bed and Comparison generated in Step 2. The outputs are the predicted values of the
The test bed of engineering analyses used to benchmark SVR original function according to the corresponding metamodeling
against other approximation methods is listed in the Appendix. technique. The difference between each predicted value y i and the
This test bed was initially created in Ref. 37 to compare the actual function value y i is calculated as the error for that test point.
predictive capability of kriging models. The 26 functions to be As in the one-dimensional example, the three error measures
approximated are derived from six engineering problems typically listed in Table 2 are calculated to assess accuracy: i RMSE, ii
used to test optimization algorithms: a two-bar truss 38, a three- maximum error, and iii average error. The RMSE gives an indi-
bar truss 38, a two-member frame 39, a helical compression cation of how accurate the approximation was overall, while the
spring 40, a welded beam 41, and a pressure vessel 40. maximum error can reveal the presence of regional areas of poor
approximation; however, due to the different magnitudes of the 26
4.1 Training and Testing Procedure. The procedure used to test functions, these error statistics enable comparisons of algo-
train and test each approximation technique is summarized as rithms only within each function. It is desirable to compare the
follows. effectiveness of each algorithm across all of the functions so that
Step 1Generate training data and test data. To minimize conclusions can be drawn; however, normalizing the errors
interactions between metamodel type and the choice of experi- against their nominal values does not provide a realistic compari-
mental design, six different sets of training data were generated son since many nominal values are close to zero, which exacer-
for each function using six different experimental designs as listed bates the relative value of the error. A common normalized statis-
in Table 4. A review and detailed comparison of these experimen- tic is the correlation coefficient R2, but it is only suited for linear
tal designs can be found in Ref. 42. The number of training approximations 48. Although correlation coefficients have been
points generated for each problem n1 is also shown. Since there devised for nonlinear approximations see Ref. 48, they apply
are six sets of training data for each of the 26 functions, a total of only to simple functions and not to the complex equations in the
6 26= 156 function approximations are constructed using each test bed. Thus, to enable comparisons of SVR to the four other
approximation technique. metamodeling techniques, the average percentage differences in
Additional test data is generated as shown in Table 4 using a error values between each algorithm and SVR is computed. In this
random Latin hypercube design 4346. The number of test manner, the error in the SVR approximations becomes the bench-
points is very large to ensure a thorough analysis of the accuracy mark to which all other metamodels are compared. Positive per-
of the resulting approximation throughout the performance space. centage values imply that the corresponding approximation had
larger errors than SVR, while negative values imply that the ap-
proximation had smaller errors. These results are presented and
Table 3 Error comparisons between approximation methods discussed in Section 4.2.1.
for one-dimensional example The robustness of an approximation method is indicated by the
variance between its error values across different sample sets 2.
The capability to continually repeat a function approximation with
similar accuracies i.e., similar errors increases the reliability of
the results and the robustness of the approximation method. To
test for robustness, SVR is again used as a benchmark to examine
the robustness of each approximation technique, and the standard

Journal of Mechanical Design NOVEMBER 2005, Vol. 127 / 1081

Downloaded From: http://mechanicaldesign.asmedigitalcollection.asme.org/ on 05/26/2015 Terms of Use: http://asme.org/terms


Fig. 5 Comparison of errors between metamodeling techniques relative to SVR

deviation is computed to indicate robustness using the following pared to the four other approximation techniques. As previously
steps: mentioned, lower overall RMSE values represent good global
function approximation across the performance space, while lower
1. For a given approximation technique, find the standard de- overall maximum error values reflect the absence of poorly ap-
viation of each error i.e., RMSE, MAE, AAE across the six proximated regions in the performance space. Thus, the results
different training sets for each function. imply that kriging models avoid areas of poor approximation, but
2. Normalize each standard deviation against the standard de- they do not perform as well globally as the SVR approximations.
viation for SVR for the same function. This normalized stan- In general, SVR has outperformed the four other approximation
dard deviation NormStdDev is calculated as: techniques, giving lower overall error values. Kriging achieved
Norm Std Dev the next best overall performance, followed by MARS, RSM, and
finally RBF. These trends are consistent with those observed in
Std. Dev. for Given Technique Std. Dev. for SVR Ref. 37, but we note that RBF has performed much better in
=
Std. Dev. for SVR other studies 2,25; we are still investigating why this has
22 occurred.
To further compare the performance of each algorithm accord-
3. Average the NormStdDev across all functions for a given ing to the type of function being modeled, the three error statistics
metamodel. for linear and nonlinear function approximations were considered
separately. Table 5 shows the number of times each metamodel
The normalized standard deviation reflects the variance in the er- type performed the best i.e., achieved the lowest corresponding
ror for a given approximation technique, relative to SVR. A posi- error statistic for nonlinear and linear function approximations.
tive value indicates a greater variance than SVR and hence less SVR achieved the highest frequency of best performances for the
robustness, while a negative value indicates a lower variance than nonlinear functions by a significant margin, whereas kriging out-
SVR and a more robust approximation method. These results are performs more often for the linear functions by a smaller margin.
presented and discussed in Section 4.2.2. These results are further confirmed in Figs. 6 and 7, which show
Step 4Compare the computational efficiency and trans-
parency of each metamodeling technique. Following Jin et al.
2, the computational efficiency refers to the computational effort
required to construct the metamodel and predict response values
for a set of new points using the metamodel, and the transparency Table 5 Frequency in which each metamodel performed best
for each error measure
is the capability of providing information concerning contribu-
tions of different variables and interactions among variables. The
five metamodeling techniques are compared on these two aspects
in Section 4.2.3.
4.2 Analysis of Results.
4.2.1 Accuracy Results. Figure 5 shows the percentages by
which average error values were higher than the corresponding
error value for SVR. These percentage differences have been av-
eraged over all 26 approximated functions. Except for the overall
maximum error for kriging, SVR has achieved lower average er-
ror values i.e., RMSE, average error, and maximum error com-

1082 / Vol. 127, NOVEMBER 2005 Transactions of the ASME

Downloaded From: http://mechanicaldesign.asmedigitalcollection.asme.org/ on 05/26/2015 Terms of Use: http://asme.org/terms


Fig. 6 Comparison of errors for linear functions relative to SVR

the comparison of errors relative to SVR for the linear and non- errors were averaged. Further investigation is needed to under-
linear functions, respectively. stand why RSM did not perform as well as expected for the linear
This separation into linear and nonlinear functions reveals the function approximations.
stronger performance of kriging in approximating linear functions,
especially in the maximum error category, indicating a good gen- 4.2.2 Robustness Results. Figure 8 presents the normalized
eral performance over the entire input space. However, the perfor- standard deviations of the errors for each approximation tech-
mance of RSM for linear functions is not a significant improve- nique, relative to SVR. Hence, the results indicate that SVR is the
ment over its performance with nonlinear functions, as would be most robust of the five approximation techniques, reinforcing the
expected. It has been noticed that the errors produced by RSM for validity of the error values obtained in Fig. 5. Kriging is the sec-
the linear functions within the spring test bed problem were par- ond most robust metamodeling technique. The variance between
ticularly large, which greatly influenced this poor result when the errors for RSM is very large, when measured relative to SVR,

Fig. 7 Comparison of errors for nonlinear functions relative to SVR

Journal of Mechanical Design NOVEMBER 2005, Vol. 127 / 1083

Downloaded From: http://mechanicaldesign.asmedigitalcollection.asme.org/ on 05/26/2015 Terms of Use: http://asme.org/terms


Fig. 8 Comparison of normalized standard deviations of errors between metamodels relative to SVR

indicating very inconsistent performance. This could be reflective transparent. The kriging models are the least transparent; however,
of the fact that RSM is most suitable for linear approximations the theta values can be interpreted with some practice 2,27,49.
only, performing well in these cases and poorly in nonlinear cases.
The relatively high robustness of SVR is followed by kriging,
MARS, RBF, and finally RSM. 5 Conclusions and Future Work
4.2.3 Computational Efficiency and Transparency. We end our In conclusion, the theory behind SVR has been presented and
analysis with a comparison of the computational efficiency and shown to possess the desirable qualities of mathematical and al-
transparency of the five metamodeling techniques. The time to fit gorithmic elegance, producing an actual approximating function
an SVR approximation was on the order of a few seconds using as opposed to a trained black box. In comparison to four common
Gunns 13 Matlab code running on a 1.8 GHz desktop PC, even approximating techniques, SVR had the best overall performance
with the optimization process, and prediction with a fitted SVR for the test bed of 26 engineering analysis functions. Only kriging
approximation took less than a second. The fitting and prediction outperformed SVR in the category of average lowest maximum
times for SVR were very comparable to that for MARS, while error. The strong performance of the SVR approximations was
RSM and RBF were much faster, taking less than a second each. reinforced through relatively small variances between error val-
The least computationally efficient were the kriging models, ues, indicating that SVR also yields a more robust approximation.
which could take several minutes to fit depending on the dimen- SVR provides a good compromise between prediction accuracy
sionality of the problem and number of samplesfortunately, pre- and robustness of a kriging model, with the computational effi-
diction with the kriging models took less than a second each. We ciency and transparency near that of a RSM or RBF approxima-
refer the reader to Ref. 2 for a more detailed investigation of the tion. These results add to the growing body of promising empiri-
effect of problem size and sample size on the computational effi- cal performance of SVR, and we are currently investigating the
ciency of kriging, RSM, RBF, and MARS. theoretical foundations of all five metamodeling techniques in
RSM provides the most transparency in terms of the function more detail to better understand where SVR obtains its strength.
relationship and the factor contributions, see Eqs. 4 and 5. The SVR implementation employed produced successful re-
MARS provides some transparency since the MARS models can sults; however, better results using SVR are anticipated through
be recast into the form 30 increased attention to the SVR algorithm, itself, and the model
parameters selected. The Matlab algorithm with which the SVR
results were obtained is not as efficient as other available algo-
y = a0 + f x + f
i i ij xi,x j + f ijkxi,x j ,xk + rithms, such as SVMlight and mySVM http://www-ai.cs.uni-
Km=1 Km=2 Km=3 dortmund.de/, which are available in C and more efficient than
23 the relatively slower Matlab implementation; however, users are
limited in the amount of tweaking they can do to the software. A
where the first sum is over all basis functions that involve only a better solution would be to code an entire SVR algorithm in C and
single variable; the second is over all basis functions that involve employ any of a number of commercially available quadratic
two variables, representing two-variable interactions if present; solvers to give users more flexibility in fitting and using SVR.
the third is over three variables, etc. RBF and SVR have explicit In addition, the SVR implemented in our experiments consis-
function equations, Eqs. 6 and 20, respectively, but the indi- tently used a Gaussian kernel function that was manually opti-
vidual factor contributions are not clear, making them even less mized for each set of training data. Automatically optimizing the

1084 / Vol. 127, NOVEMBER 2005 Transactions of the ASME

Downloaded From: http://mechanicaldesign.asmedigitalcollection.asme.org/ on 05/26/2015 Terms of Use: http://asme.org/terms


radius of the Gaussian kernel function during training could fur- the simulated annealing algorithm in kriging 28 could further
ther improve the results. For example, data suspected to be poly- improve the results. Alternatively, cross-validation methods for
nomial will probably be modeled more accurately with the use of fitting the SVR approximation could be investigated.
a polynomial kernel function instead of a Gaussian one; however,
the optimal choice for the kernel function is still an area of active Acknowledgments
research and will be investigated in future work. Theoretically, We gratefully acknowledge support from the U.S. Department
any symmetric function that results in a positive definite kernel of Transportation and the Federal Transit Administration. This
matrix can be used. The radius of the Gaussian kernel function work was conducted as part of the development of a web-based
was also manually optimized in this study for each function. information management system for storing, displaying, and ana-
Adopting a method to automatically optimize this radius such as lyzing bus test data.

Appendix: Approximation Test Bed for Comparing Metamodeling Techniques

Journal of Mechanical Design NOVEMBER 2005, Vol. 127 / 1085

Downloaded From: http://mechanicaldesign.asmedigitalcollection.asme.org/ on 05/26/2015 Terms of Use: http://asme.org/terms


References 18 Korngold, J. C., and Gabriele, G. A., 1997, Multidisciplinary Analysis and
Optimization of Discrete Problems Using Response Surface Methods, J.
1 Kleijnen, J. P. C., 1987, Statistical Tools for Simulation Practitioners, Marcel Mech. Des., 1194, pp. 427433.
Dekker, New York. 19 Wang, G., 2003, Adaptive Response Surface Method Using Inherited Latin
2 Jin, R., Chen, W., and Simpson, T. W., 2001, Comparative Studies of Meta- Hypercube Designs, J. Mech. Des., 1252, pp. 210220.
modeling Techniques Under Multiple Modeling Criteria, Struct. Multidiscip. 20 Hernandez, G., Simpson, T. W., Allen, J. K., Bascaran, E., Avila, L. F., and
Optim., 231, pp. 113. Salinas, F., 2001, Robust Design of Families of Products With Production
3 Haykin, S., 1999, Neural Networks: A Comprehensive Foundation, 2nd Edi- Modeling and Evaluation, J. Mech. Des., 1232, pp. 183190.
tion, Prentice Hall, Upper Saddle River, NJ. 21 Lewis, K., and Mistree, F., 1998, Collaborative, Sequential, and Isolated De-
4 Cristianni, N., and Shawe-Taylor, J., 2000, An Introduction to Support Vector cisions in Design, J. Mech. Des., 1204, pp. 643652.
Machines and Other Kernel-Based Learning Methods, Cambridge University 22 Dyn, N., Levin, D., and Rippa, S., 1986, Numerical Procedures for Surface
Press, Cambridge, UK. Fitting of Scattered Data by Radial Basis Functions, SIAM Soc. Ind. Appl.
5 Hearst, M. A., 1998, Trends Controversies: Support Vector Machines, IEEE Math. J. Sci. Stat. Comput., 72, pp. 639659.
Intell. Syst., 134, pp. 1828. 23 Powell, M. J. D., 1987, Radial Basis Functions for Multivariable Interpola-
6 Takeuchi, K., and Collier, N., 2002, Use of Support Vector Machines in tion: A Review, Algorithms for Approximation, J. C. Mason and M. G. Cox,
Extended Named Entity, Proc. of Sixth Conference on Natural Language eds., Oxford University Press, London, pp. 143167.
Learning (CoNLL-2002), D. Roth and A. van den Bosch, eds., Taipei, Taiwan, 24 Tu, C., and Barton, R. R., 1997, Production Yield Estimation by the Meta-
Association for Computational Linguistics, New Brunswick, NJ, pp. 119125. model Method with a Boundary-Focused Experiment Design, ASME Design
7 Dumais, S. T., Platt, J., Heckerman, D., and Saharni, M., 1998, Inductive Engineering Technical Conferences-Design Theory and Methodology, Sacra-
Learning Algorithms and Representations for Text Categorization, Proc. of mento, CA, ASME, Paper No. DETC97/DTM-3870.
7th Int. Conference on Information and Knowledge Management, Bethesda, 25 Meckesheimer, M., Barton, R. R., Simpson, T. W., Limayem, F., and Yannou,
MD, ACM, New York, pp. 148155. B., 2001, Metamodeling of Combined Discrete/Continuous Responses,
8 Prakasvudhisarn, C., Trafalis, T. B., and Raman, S., 2003, Support Vector AIAA J., 3910, pp. 19551959.
Regression for Determination of Minimum Zone, ASME J. Manuf. Sci. Eng., 26 Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P., 1989, Design and
1254, pp. 736739. Analysis of Computer Experiments, Stat. Sci., 44, pp. 409435.
9 Vapnik, V., and Lerner, A., 1963, Pattern Recognition Using Generalized 27 Koehler, J. R., and Owen, A. B., 1996, Computer Experiments, Handbook of
Portrait Method, Autom. Remote Control Engl. Transl., 246, pp. 774 Statistics, S. Ghosh and C. R. Rao, eds., Elsevier Science, New York, pp.
780. 261308.
10 Vapnik, V., 1995, The Nature of Statistical Learning Theory, Springer, New 28 Simpson, T. W., Mauery, T. M., Korte, J. J., and Mistree, F., 2001, Kriging
York. Metamodels for Global Approximation in Simulation-Based Multidisciplinary
11 Smola, A. J., and Schlkopf, B., 1998, A Tutorial on Support Vector Regres- Design Optimization, AIAA J., 3912, pp. 22332241.
sion, NeuroCOLT2 Technical Report Series, NC2-TR-1998-030, Berlin, Ger- 29 Pacheco, J. E., Amon, C. H., and Finger, S., 2003, Bayesian Surrogates Ap-
many. plied to Conceptual Stages of the Engineering Design Process, J. Mech. Des.,
12 Vapnik, V., Golowich, S., and Smola, A., 1997, Support Vector Method for 1254, pp. 664672.
Function Approximation, Regression Estimation, and Signal Processing, Ad- 30 Friedman, J. H., 1991, Multivariate Adaptive Regression Splines, Ann. Stat.,
vances in Neural Information Processing Systems, M. Mozer, M. Jordan, and 191, pp. 167.
T. Petsche, eds., MIT Press, Cambridge, MA, pp. 281287. 31 Chen, V. C. P., Ruppert, D., and Shoemaker, C. A., 1999, Applying Experi-
13 Gunn, S. R., 1997, Support Vector Machines for Classification and Regres- mental Design and Regression Splines to High-Dimensional Continuous-State
sion, Technical Report, Image Speech and Intelligent Systems Research Stochastic Dynamic Programming, Oper. Res., 47, pp. 3853.
Group, University of Southampton, UK. 32 Chen, V. C. P., 1999, Application of MARS and Orthogonal Arrays to Inven-
14 Simpson, T. W., Peplinski, J., Koch, P. N., and Allen, J. K., 2001, Metamod- tory Forecasting Stochastic Dynamic Programs, Comput. Stat. Data Anal.,
els for Computer-Based Engineering Design: Survey and Recommendations, 30, pp. 317341.
Eng. Comput., 172, pp. 129150. 33 Smola, A. J., Schlkopf, B., and Mller, K. R., 1998, The Connection Be-
15 Myers, R. H., and Montgomery, D. C., 1995, Response Surface Methodology: tween Regularization Operators and Support Vector Kernels, Neural Net-
Process and Product Optimization Using Designed Experiments, Wiley, New works, 114, pp. 637649.
York. 34 Schlkopf, B., and Smola, A. J., 2002, Learning with Kernels: Support Vector
16 Cappelleri, D. J., Frecker, M. I., Simpson, T. W., and Snyder, A., 2002, A Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge,
Metamodel-Based Approach for Optimal Design of a PZT Bimorph Actuator MA.
for Minimally Invasive Surgery, J. Mech. Des., 1242, pp. 354357. 35 Su, J., and Renaud, J. E., 1997, Automatic Differentiation in Robust Optimi-
17 Chen, W., Allen, J. K., Tsui, K.-L., and Mistree, F., 1996, A Procedure for zation, AIAA J., 356, pp. 10721079.
Robust Design: Minimizing Variations Caused by Noise and Control Factors, 36 Markowetz, F., 2001, Support Vector Machines in Bioinformatics, Diploma
J. Mech. Des., 1184, pp. 478485. Thesis in Mathematics, University of Heidelberg, Germany.

1086 / Vol. 127, NOVEMBER 2005 Transactions of the ASME

Downloaded From: http://mechanicaldesign.asmedigitalcollection.asme.org/ on 05/26/2015 Terms of Use: http://asme.org/terms


37 Simpson, T. W., 1998, A Concept Exploration Method for Product Family 44 Kalagnanam, J. R., and Diwekar, U. M., 1997, An Efficient Sampling Tech-
Design, Ph.D. Dissertation, G.W. Woodruff School of Mechanical Engineer- nique for Off-Line Quality Control, Technometrics, 393, pp. 308319.
ing, Georgia Institute of Technology, Atlanta, GA. 45 Tang, B., 1993, Orthogonal Array-Based Latin Hypercubes, J. Am. Stat.
38 Schmit, L. A., 1981, Structural SynthesisIts Genesis and Development, Assoc., 88424, pp. 13921397.
AIAA J., 1910, pp. 12491263. 46 Fang, K.-T., Lin, D. K. J., Winker, P., and Zhang, Y., 2000, Uniform Design:
39 Arora, J. S., 1989, Introduction to Optimum Design, McGraw-Hill, New York. Theory and Application, Technometrics, 42, pp. 237248.
40 Sandgren, E., 1990, Nonlinear Integer and Discrete Programming in Me- 47 Wang, W. J., Xu, Z. B., Lu, W. Z., and Zhang, X. Y., 2003, Determination of
chanical Design Optimization, J. Mech. Des., 1122, pp. 223229. the Spread Parameter in the Gaussian Kernel for Classification and Regres-
41 Ragsdell, K. M., and Phillips, D. T., 1976, Optimal Design of a Class of
sion, Neurocomputing, 551, pp. 643663.
Welded Structures Using Geometric Programming, ASME J. Eng. Ind.,
48 Cameron, A. C., and Windmeijer, F. A. G., 1997, An R-Squared Measure of
983, pp. 10211025.
42 Simpson, T. W., Lin, D. K. J., and Chen, W., 2001, Sampling Strategies for Goodness of Fit for Some Common Nonlinear Regression Models, J.
Computer Experiments: Design and Analysis, Int. J. Reliab. Appl., 23, pp. Econometr., 772, pp. 329342.
209240. 49 Martin, J. D., and Simpson, T. W., 2004, On the Use of Kriging Models to
43 McKay, M. D., Beckman, R. J., and Conover, W. J., 1979, A Comparison of Approximate Deterministic Computer Models, ASME Design Engineering
Three Methods for Selecting Values of Input Variables in the Analysis of Technical Conferences-Design Automation Conference, Salt Lake City, W.
Output From a Computer Code, Technometrics, 212, pp. 239245. Chen, ed., ASME, New York, ASME Paper No. DETC2004/DAC-57300.

Journal of Mechanical Design NOVEMBER 2005, Vol. 127 / 1087

Downloaded From: http://mechanicaldesign.asmedigitalcollection.asme.org/ on 05/26/2015 Terms of Use: http://asme.org/terms

You might also like