You are on page 1of 19

A Forecasting Capability Study of Empirical Mode

Decomposition for the Arrival Time of a Parallel


Batch System
Linh Ngo and Amy Apon
University of Arkansas

Doug Hoffman
Acxiom Corporation

Introduction:
Empirical Mode Decomposition (EMD)
Huang et. al.
Represents non-stationary complex time signals as sum of
Intrinsic Mode Function (IMF)
IMF:








The number of extremes and the number of zero crossing in


the whole data set must either equal or differ at most by one
At any point, the mean value of the envelope defined by the
local maxima and the envelope defined by the local minima is
zero

Introduction:
EMD Sifting Process
Data

Construct upper
envelope cubic spline
and lower envelope
cubic spline

Find mean of
upper and lower
envelopes, and
subtract this mean
from data

IMF?

Yes
No

Subtract IMF from


Data

No

Stop

Monotonic?
Manual?

Yes

Introduction:
IMF example

Introduction:
Application to Arrival Histogram
Arrival Histogram of March 2007
300

Arrival Count Per Bucket

250

200

150

100

50

0
1

25 49 73 97 121 145 169 193 217 241 265 289 313 337 361 385 409 433 457 481 505 529 553 577 601 625 649 673 697 721
One-Hour Bucket
Beginning on 00:00 Wednesday March 01
Ending on 23:58 Saturday March 31

Introduction:
Application to Arrival Histogram

Workload Characterization using EMD


EMD and Workload Characterization




Workload Histogram decomposition


Piecewise sine fitting

Characterization Results





Improvements over hyper-exponential distribution


Require non-trivial manual fine tuning
Impractical comparing to traditional distribution
techniques

Can EMD do anything else?


Workload Forecasting




Preprocessing data for forecasting techniques


Improve accuracy and flexibility of predicted data

A common preprocessing technique to both workload


characterization and forecasting




Workload characterization model that reflects actual future


workload
Forecasting model with extended range and modification
capability

Characterization and Forecasting:


Comparison

No

Original
Workload

Past
Workload

Characterization
Model

Forecasting
Model

Statistical
Measurements

No

Exact
Measurements

Future of Past
Workload

Yes
Yes

System/Simulator

No

Performance
Measurements
No
Yes
Model modification to create
hypothetical scenarios for capacity
planning

Real Time
Prediction
Exact
Measurements
Yes

Scheduling/Resource Management based


on prediction

Real Future
Workload

Forecasting Feasibility Study:


Data Preprocessing


Pattern isolation:





Workloads, enterprise workloads in particular, contain patterns


9-5, M-F, monthly, yearly, holidays,
Individual patterns = Individual Signals with unique frequencies
Signal decomposition techniques

Difficulties:



10

Different arrival sources carry different patterns


Patterns that exist only for a period of time

Forecasting Feasibility Study:


Arrival Patterns

Arrival Time Histogram Comparison (Wednesday


and Thursday, last week of May 2006)


11

Arrival Time Histogram Comparison (


Wednesdays, June 2006)

Existence of daily arrival patterns

Forecasting Feasibility Study:


Arrival Patterns
Comparing IMFs of
adjacent groups:
1.1, 1.2, 1.3, and 1.4
are the IMFs of the
first group (4193
10,000) - 2.1, 2.2,
2.3, and 2.4 are for
the second group (0
- 5,000), and 3.1, 3.2,
3.3, and 3.4 are for
the overall group (0
- 10,000)

12

IMFs generated from adjacent data subset exhibit a sense


of continuity

Forecasting Feasibility Study:


Hypothesis


IMFs can be used to isolate signals with patterns to be


inputs of a forecasting technique

13

Forecasting Feasibility Study:


Algorithm and Evaluation Metric


Algorithm:




Using two set of data, estimation data and prediction data


Calculate the optimal estimated weights of the estimation
data set
Apply the calculated weights to the prediction data set in
order to find out the future data

Evaluation:


Mean Average Percentage Error


n 1

predictedi measuredi

i =0

measuredi

MAPE =

14

Forecasting Feasibility Study:


Preliminary Results

Estimation MAPE

Prediction MAPE

0.0%

53.89%

13.68%

32.27%

10

2.65%

39.08%

12

0.9%

36.2%

predict the first Thursday of June 2006 based on the original histograms of the first
Wednesday of June 2006, and the last Wednesday and Thursday of May 2006.

Experiment 2:


IMF Count

Experiment 1:


Experiment

predict the first Thursday of June 2006 with the same data set, but this time the days are
decomposed to IMFs.

Experiment 3 and 4:





15

predict the first Thursday of June 2006, using the IMFs of the first Wednesday of June, and
the last Wednesday and Thursday of May to predict first Thursday of June.
The ranges of the empirical data used by the EMD process are extended.
Experiment 3: Full month of May until the first Wednesday of June.
Experiment 4: Full month of April and May until the first Wednesday of June.

Forecasting Feasibility Study:


Preliminary Analysis




Experiments with EMD preprocessing offer a better


prediction result.
An increases in data retention in Experiment 4 from
Experiment 3 improved the prediction.
Experiment 2:




16

Low number of IMFs


Low estimation MAPE
High prediction MAPE

Forecasting Feasibility Study:


Conclusion



EMD: Potential data processing platform


Need better predictive tools to increase the accuracy
of EMD-based predictions

17

Proposed Work






Prediction techniques for EMD-based data


Comparison of EMD as a data preprocessing tool
against traditional decomposition techniques (Wavelet
and Fourier)
Application on different workloads
Effect of the length of retained data for forecasting

18

Questions?

19

You might also like