ICEAC 2011-AsymmetricalSupportVectorRegression

Asymmetrical and Lower Bounded Support Vector Regression for Power Estimation
Mel Stockman#1, Mariette Awad#2, Rahul Khanna*3

#
Electrical and Computer Engineering Department, American University of Beirut Beirut, Lebanon
2
mas113@aub.edu.lb mariette.awad@aub.edu.lb
Intel Corporation,Oregon, USA 3 rahul.khanna@intel.com AbstractIn an energy aware environment, designers frequently around the estimated function such that the absolute values of turn to advanced power reduction techniques such as power errors less than a certain threshold are ignored both above shutoff and multi-supply-voltage architectures. In order to and below the estimate. In this manner, points outside the implement these techniques, it is important that power estimates tube are penalized but those within the tube, either above or be made. Power prediction is a critical necessity as chip sizes below the function, receive no penalty. continually decrease and the desire for low power consumption is However, when dealing with damaging consequences, it is a foremost design objective. For such predictions, it is crucial to avoid underestimating power since reliability issues and possible essential to restrict the loss function so that underestimates are chip damage might occur. It becomes necessary to eliminate or eliminated as much as possible. In this paper we propose strictly limit underestimations by relaxing accuracy constraints modifying Vapniks -insensitive loss function to be while decreasing the likelihood that the estimation undershoots asymmetrical in order to limit underestimates. The -tube is the actual value. Our novel approach, Asymmetrical and Lower cut in half and aligned to limit the function without allowing Bounded Support Vector Regression modifies the Support points to fall below it. We bound the -tube from beneath and Vector Regression technique by Vapnik and provides accurate apply a much higher penalty on estimates which lie under the prediction while maintaining a low number of underestimates. We tested our approach on two different power data sets and function. This leads to an asymmetric loss function for achieved accuracy rates of 5.72% and 5.06% relative percentage training whereby a greater penalty is applied when the error while keeping the number of underestimates below 2.81% misestimate is lower than when it is higher. The remainder of this paper is organized as follows: and 1.74%. Section 2 discusses prior research. Section 3 gives a brief overview of regular support vector regression (SVR). Section Keywords: Asymmetrical Loss Function, Support Vector 4 explains our asymmetrical approach. Section 5 explains the Regression, Bounded Function two different data sets. Section 6 presents our experimental I. INTRODUCTION results. Section 7 shows a comparison of the differences for Power prediction is a critical necessity as chip sizes shrink SVR and ALB-SVR and section 8 provides our conclusion. and the desire for low power consumption is a foremost II. PRIOR RESEARCH design objective. Estimates of power consumption are used in Although we are not aware of prior work specifically both chip design (to implement such techniques as power shutoff (PSO) and multi-supply-voltage (MSV) as well as in addressing our approach, we survey in this section some production environments and data centers (to determine related work available in literature. Authors in [2] use an cooling and UPS needs). In the design area, under-predicting asymmetric -insensitive loss function in Support Vector power could lead to chip reliability issues, poorer than Quantile Regression (SVQR) in an attempt to decrease the anticipated application performance, unavailable processor number of support vectors. They alter the insensitiveness resources, needlessly powering down chip components, etc. according to the quantile and achieve a more sparse model. In a production environment, under-predicting power could Our work differs from theirs in that their aim was to decrease lead to insufficient cooling of data centers or inadequate UPS the number of support vectors while maintaining the same power supply. In these cases, it is crucial to minimize accuracy as a regular SVQR, while our approach specifically underestimates even at the risk of reducing the accuracy of the seeks to limit underestimates at the possible cost to accuracy. estimation. Asymmetrical loss functions are discussed in [3] where the Support Vector Regression (SVR) as proposed by Vapnik authors study different loss-functions for Bayes parameter [1] has proven to be an effective tool in real value function estimation. They use a 2-sided quadratic loss function and a estimation. The usual approach trains using a symmetrical loss quasi-quadratic s-loss function and show the comparison and function which equally penalizes both high and low derive results to illustrate that this modified version shows a misestimates. Using Vapniks -insensitive approach, a smaller increase of loss and can be used in real world flexible tube of minimal radius is formed symmetrically situations where overestimation and underestimation have
different importance. The authors in [4] study Bayesian risk analysis and replace the quadratic loss-function with an asymmetric loss-function to derive a general class of functions which approach infinity near the origin to limit underestimates. In [5], the authors presents a maximum margin classifier which bounds misclassification for each class differently thus Figure 2. -insensitive loss function allowing for different tolerances levels. In [6], the authors use a smoothing strategy to modify the typical SVR approach into leading to the SVR error function: a non-constrained problem thereby only solving a system of 1 linear equations rather than a convex quadratic program. In (6) (+ + ) + 2 2 [7], three different loss functions are compared for economic =1 tolerance design: Taguchis quadratic loss function, Inverted which should be minimized subject to the constraints + 0, Normal Loss Function and Revised Inverted Normal Loss 0 and (2) and (3). Function. IV. ASYMMETRICAL AND LOWER BOUNDED SUPPORT VECTOR III. SUPPORT VECTOR REGRESSION REGRESSION In Vapniks -insensitive SVR [1], a real value y is predicted Our novel approach, Asymmetrical and Lower Bounded as: Support Vector Regression (ALB-SVR), modifies the SVR (1) loss function and corresponding error function such that the = + epsilon tube is only above the function as shown in Figs. 3 , = 1 , , and 4. The penalty parameter, C is split into C+ and C- so that using a tube bounded by as shown in Fig. 1. The penalty function is characterized by only assigning a penalty if different penalties can be applied to the upper and lower the predicted value yi is more than away from the actual mispredictions. Equations 3,4 and 6 are modified as follows: (7) value ti, (i.e. ). Those data points which lie outside the -tube are Support Vectors (SVs) and are given , the same penalty whether they lie above ( + ) or below ( ) + 0 0 the tube ( > 0, > 0 ) : (8) = > + (2) + + ( ) (3) 1 + + (9) + + 2 2 The accuracy of the estimation is then measured by the loss =1 =1 function , as shown in Fig. 2: , = 0 (4)
The empirical risk is: 1
( , )
=1
(5)
Figure 3. ALB-SVR with -insensitive tube
Figure 4. ALB-SVR loss function Figure 1. SVR with -insensitive tube
Introducing Lagrange multipliers: + 0, 0, + 0, 0
+ =1
=1
1 + 2
VI. DATA SETS

2
( + + + )
=1
A. Gladiator Data Set The Gladiator data set from [8] consists of 640 samples of 6 attributes of telemetry data from a distributed set of physical and logical sensors as shown in Table I along with the (10) corresponding power in milliwatts.
TABLE I GLADIATOR DATA SET ATTRIBUTES
=1
+( + + + ) ( + )
=1
which leads to:
=0 =0
=
=1
(+ )
CPU1 Vtt1 CPU1 Vcc1 CPU1 Vsa CPU2 Vtt1 CPU2 Vcc1 CPU2 Vsa
termination, misc IO power core power system agent, uncore, I/O power termination, misc IO power core power system agent, uncore, I/O power
(11) B. RAPL Data Set The data set taken from [9,10] consists of 17765 samples of 5 attributes of memory activity counters as described in Table II with the actual corresponding power consumed in watts as measured directly by a memory power riser.
TABLE II MEMORY POWER MODEL ATTRIBUTES
(+ ) = 0
=1
(12)
=0 +
+ = + +
(13)
(14) =0 = Substituting (11) and (12) and maximizing LD with respect to + and (+ 0, 0 ) where:

Activity Activate(A) Read (R) Write (W) CKE=High CKE=Low
Units nj/Activate nj/Read nj/Write mW mW
=
=1
(+ 1 2

=1
=1
(15)
+ +
(+ )(+ )x x
,
Since 0 and 0 and (13) and (14), therefore + and . Thus we need to find

(+
+,

=1
=1
(16) x
max
=1
1 2
(+
,
)(+
)x
with 0 + +, 0 and 0 . Substituting (11) into (1)
+ =1 (
) =
=
=1
(+ ) +
(17)
Support Vector (SV) xS can be found with the indices where 0 < + < + and 0 < < and + = 0 (or = 0) and b can be derived by:
=
=
+ ( )
(18)
VII. EXPERIMENTS AND RESULTS We modified the code in LIBSVM [11] for ALB-SVR. For all experiments, we normalized the data and took the average of 10 runs of 3 fold cross validation. Using an rbf kernel, we performed a grid search combined with heuristic experimentation for both SVR and ALB-SVR to find the best meta parameters , g, C+ and C-. Table III and Figs. 5-8 show the results of SVR and ALB-SVR . The table shows the various values of the Cn,Cp, and g metaparameters along with the number of SVs, the total number of iterations needed for the algorithm to converge to the solution, the relative percentage error and the percentage of estimated datapoints falling below the actual values. ,As can be seen, the number of underestimates for the SVR is around 50% which is due to the fact that SVR centers the epsilon tube around the data. ALB-SVR positions the half tube under the data allowing only a small number of estimated points to fall below the actual values. This effectively accomplishes the limiting of underestimation with a slight decrease in accuracy of the ALB-SVR. The accuracy is necessarily less than that of SVR because the estimation is now skewed higher. Model performance is evaluated by computing percentage relative error as: 1 = 100 (19) =1

V.
The percentage relative error for ALB-SVR on the Gladiator data set was 5.72% and for the RAPL data set it was
5.06%. This is only slightly higher than the SVR which had 1.72% and 1.82% on the respective datasets. This is acceptable since we have drastically minimized the number of underestimates. As also can be seen, the number of support vectors are greater in ALB-SVR than in SVR. VIII. COMPARISON OF SVR AND ALB-SVR Comparing ALB-SVR to SVR allows us to look at the tradeoffs involved with using this technique. A. Empirical Risk By substituting the new loss function, ALB-SVRs empirical risk becomes: 1 =
underestimates are greatly limited. This comes at the expense of accuracy but nevertheless is helpful for applications which are highly sensitive to such mispredictions such as power estimation. We tested our approach on two different power data sets and achieved accuracy rates of 5.72% and 5.06% relative percentage error while keeping the number of underestimates below 2.81% and 1.74%. Future work will include different data sets and techniques for more accurately selecting the meta parameters as well as improve the % error. ACKNOWLEDGEMENTS This work is partly supported by MER, a partnership between Intel Corporation and King Abdul-Aziz City for Science and Technology (KACST) to conduct and promote research in the middle east and the University Research Board at the American University of Beirut. REFERENCES
[1] [2] V. Vapnik. Statistical Learning Theory, (Wiley, New York, 1998) K Seok, D Cho, C. Hwang; and J. Shim; , "Support vector quantile regression using asymmetric e-insensitive loss function," Education Technology and Computer (ICETC), 2010 2nd International Conference on , vol.1, no., pp.V1-438-V1-439, June 2010. [3] H. Schabe, , "Bayes estimates under asymmetric loss," Reliability, IEEE Transactions on , vol.40, no.1, pp.63-67, Apr 1991. [4] J.G. Norstrom,, "The use of precautionary loss functions in risk analysis," Reliability, IEEE Transactions on , vol.45, no.3, pp.400-403, Sep 1996. [5] J. Saketha Nath and C. Bhattacharyya., Maximum Margin Classifiers with Specified False Positive and False Negative Error Rates, Proceedings of SDM Conference, Minneapolis, 2007. [6] Yuh-Jye Lee; Wen-Feng Hsieh; Chien-Ming Huang; , "-SSVR: a smooth support vector machine for -insensitive regression," Knowledge and Data Engineering, IEEE Transactions on , vol.17, no.5, pp. 678685, May 2005. [7] Jeh-Nan Pan; Jianbiao Pan; , "A Comparative Study of Various Loss Functions in the Economic Tolerance Design," Management of Innovation and Technology, 2006 IEEE International Conference on , vol.2, no., pp.783-787, June 2006. [8] Intel Document Gladiator Telemetry Harness for Energy Efficient Computing, 2010. [9] H. David, E. Gorbatov, U. Hanebutte, R. Khanna, and C. Le, RAPL: Memory Power Estimation and Capping, International Symposium on Low Power Electronics and Design (ISLPED), pp. 14-15,August, 2010. [10] M. Stockman, M. Awad, R. Khanna, C. Le, H. David, E. Gorbatov,U. Hanebutte A Novel Approach to Memory Power Estimation Using Machine Learning , International Conference on Energy Aware Computing (ICEAC), pp. 1-3, December, 2010. [11] C. Chang and C. Lin, LIBSVM : a library for support vector machines, 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
( , )
=1
(20)
The maximum additional empirical risk for ALB-SVR can be computed to be:

+
>
(21)
B. Number of Support Vectors (SV) and Convergence In SVR, support vectors (SV) are those points which lie outside the epsilon tube. The smaller the value of , the more points will lie outside the tube and hence there will be more SVs. In ALB-SVR, we have essentially cut the epsilon tube in half. We no longer have the lower epsilon bound. Therefore, for the same g and epsilon parameters, more points lie outside the tube and there will be a larger number of SVs. This increase in the number of SVs indicates that using ALB-SVR has some negative effects on the complexity of the estimating function. However, as seen in Table III, the Gladiator data set did not show a significant increase in SVs. This may be due to the fact that the data set is relatively small. As also can be seen in Table III, the number of iterations was smaller in ALB-SVR indicating the algorithm converged faster and hence this may offset the larger number of SVs using this approach. For our ALB-SVR model, we used a grid search and heuristics to determine optimal meta parameters. We achieved the goal of limiting the underestimates to 2.71% for the Gladiator data set and 1.74% for the RAPL data set as compared to 50.33% and 57.54% for SVR. IX. CONCLUSION AND FUTURE WORK We have shown our novel approach ALB-SVR to be an effective technique to bound an estimation such that
TABLE III COMPARATIVE RESULTS OF SVR VS. ALB-SVR Model # 1 2 3 4 Rbf Kernel param g 16 1 706 706 % relative error 1.72 5.72 1.82 5.06 % out of bound 50.33 2.81 57.54 1.74
Data Model Gladiator SVR Gladiator ALB-SVR RAPL SVR RAPL ALB-SVR
C+ 512 32768 512 1000000
C32 10
0.23 0.00039 0.10 0.2
# SV 319 320 2786 4932
# Iter 2723 417 571649 58220
Figure 5. Power estimates for Gladiator Data with SVR (Model #1)
Figure 6. Power estimates for Gladiator Data with ALB-SVR (Model #2)
Fig. 7 Power estimates for RAPL Data with SVR (Model #3)
Figure 8. Power estimates for RAPL Data with ALB-SVR (Model #4)

ICEAC 2011-AsymmetricalSupportVectorRegression

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ICEAC 2011-AsymmetricalSupportVectorRegression

Uploaded by

Copyright:

Available Formats

Asymmetrical and Lower Bounded Support Vector Regression for Power Estimation

Mel Stockman#1, Mariette Awad#2, Rahul Khanna*3

The empirical risk is: 1

Figure 3. ALB-SVR with -insensitive tube

Figure 4. ALB-SVR loss function Figure 1. SVR with -insensitive tube

Introducing Lagrange multipliers: + 0, 0, + 0, 0

VI. DATA SETS

which leads to:

Activity Activate(A) Read (R) Write (W) CKE=High CKE=Low

Units nj/Activate nj/Read nj/Write mW mW

with 0 + +, 0 and 0 . Substituting (11) into (1)

C+ 512 32768 512 1000000

0.23 0.00039 0.10 0.2

# SV 319 320 2786 4932

# Iter 2723 417 571649 58220

You might also like