You are on page 1of 45

Advances in CER Development:

Errors-in-Variables Regression
Raymond Covert
Technical Director
MCR, LLC
rcovert@mcri.com

Presented to the
European Aerospace Working Group on
Cost Engineering (EACE)

Frascati, Italy
24-25 April 2007
08/10/09 Reprinted with permission of MCR, LLC
© MCR, LLC
Agenda

• Introduction
• Errors-In-Variables Regression
• Sources of Uncertainty
• Examples
– CER Regression with Normalization Uncertainty
– CER Regression with Fuzzy Cost Drivers
– Spacecraft EPS NR and REC CER Example
• EIV Modeling Benefits and Drawbacks
• Summary

08/10/09
© MCR, LLC 2
Agenda

• Introduction
• Errors-In-Variables Regression
• Sources of Uncertainty
• Examples
– CER Regression with Normalization Uncertainty
– CER Regression with Fuzzy Cost Drivers
– Spacecraft EPS NR and REC CER Example
• EIV Modeling Benefits and Drawbacks
• Summary

08/10/09
© MCR, LLC 3
Introduction

• At the Joint EACE/SSCAG/SCAF Meeting we introduced


Errors-in-Variables (EIV) regression [Ref. 1]
– Used fictitious data to highlight effects of uncertainty and fuzzy
variables in CER development
• Recently, we experimented with EIV regression using
real cost data
– Abandoned all assumptions and began treating normalization
and regression problem as completely random process
– Exposed additional sources of uncertainty that warrant use of
EIV regression over traditional methods
– Exposed strengths and weaknesses of techniques
• This presentation provides highlights of original
presentation and recent advances

08/10/09
© MCR, LLC 4
Background

• Traditional regression techniques used in


statistically derived cost and schedule relationships:
– Ordinary least squares (OLS)
• Minimizes sum of squares of errors
• For linear relationships with additive error term: y=a+bx+ε
– Log-OLS
• Minimizes sum of squares of log of errors
• For power relationships with multiplicative error term: y=axbε
– Constrained optimization
• Minimizes a penalty function (sum of squares of errors,
percent errors, etc.) while constraining some other term (e.g.,
bias =0)

Traditional regression techniques used in cost analysis assume


that independent variables are constant and known exactly
08/10/09
© MCR, LLC 5
Uncertainty in Dependent and
Independent Variables

• However, even independent variables are not necessarily


constant parameters
• They may be random variables with uncertainty due to
the following:
– Normalizing data (nonrecurring [NR] and recurring [REC] split,
inflation assumptions, treatment of qualification and engineering
units [EU])
– Uncertain cost driver values (multiple “versions” of weight,
power, etc.)
– “Fuzzy” cost drivers such as percent new design, manufacturing
complexity, design difficulty, etc.
• Is there a method of regression that accommodates this
additional uncertainty?
Both independent and dependent variables may be
08/10/09 uncertain parameters (random variables)
© MCR, LLC 6
Errors-in-Variables Regression

• Errors-in-variables (EIV) is a robust modeling technique


in statistics that assumes every variable can have error or
noise
• Also referred to as Total Least Squares (TLS)
• Started with R. J. Adcock’s one-page paper in The
Analyst “A Problem in Least Squares” (Des Moines, Iowa)
in 1878
• Simple linear regression (OLS) is special case in which
we assume no measurement errors in independent
variables
• P. Foussier presented EIV regression from different
perspective in his 2006 ISPA Conference paper “Palliating
the Bias Introduced by Linear Regression” [Ref. 2]
08/10/09
© MCR, LLC 7
Advancing the State of the Art
in Regression

• Constrained Optimization allows unprecedented freedom


over traditional methods
– OLS and Log-OLS are simple, analytical solutions, but
• They restrict us to CERs of the form y=a+bx+e or y=axb*e ;
where [a, b] are coefficients and e is additive or multiplicative
error term, respectively
• OLS produces constant error term, Log OLS produces biased
results (that require correction)
– Constrained optimization allows freedom to choose
• Form of the CER (e.g., y=a+bxcqd*e)
• How we wish to model the error term, multiplicative or additive
• Whether to eliminate bias (which we can constrain to zero)
• EIV modeling allows
– Same freedoms of constrained optimization
08/10/09– Ability to include effects of uncertainty in our data
© MCR, LLC 8
Agenda

• Introduction
• Errors-In-Variables Regression
• Sources of Uncertainty
• Examples
– CER Regression with Normalization Uncertainty
– CER Regression with Fuzzy Cost Drivers
– Spacecraft EPS NR and REC CER Example
• EIV Modeling Benefits and Drawbacks
• Summary

08/10/09
© MCR, LLC 9
Regressing Constant Variables

• OLS, Log-OLS and Constrained Optimization Regression


techniques assume constant values for independent (x)
variables

c
Cost
Cost==aa++bx
bX c

Historical data point


$ (y)
Cost estimating relationship

Standard percent error bounds

Cost Driver (x)

Traditional regression techniques used in cost analysis assume


08/10/09
that independent variables are constant and known exactly
© MCR, LLC 10
Regressing Random Variables

• EIV regression assumes uncertain (random) values for


both dependent (x) and independent (y) variables

Random values for (x, y)


CER
c
Cost
Cost==aa++bx
bX c

Historical data distribution


$ (y)
Cost estimating relationship

Standard percent error bounds

Cost Driver (x)

EIV regression assumes variables are uncertain (random)


08/10/09
© MCR, LLC 11
EIV Regression
Tools and Techniques

• We need two tools to perform EIV regression


– Monte Carlo simulation to model uncertainty
– Constrained optimization tool to solve for CER
coefficients under constraint (e.g., zero bias)
• We tested two methods to perform EIV regression
– Crystal Ball with OptQuest®, which has these two tools
built into one
• Benefits: Simple spreadsheet application, search for global
minimum
• Drawbacks: Coefficient “search” takes a lot of time (hours)
– Dump trials into a spreadsheet and perform
regression using Premium Solver
• Benefits: Finds minimum rather quickly (minutes)
• Drawbacks: May not be global minimum, spreadsheet is large

08/10/09
© MCR, LLC 12
EIV Using Crystal Ball
with OptQuest®
• Uncertain variables can be modeled in spreadsheet using
statistical simulation tool (Crystal Ball) with optimization
capability (OptQuest®)
• Random variables defined for uncertain variables that
constitute x,y data points - cost drivers, normalization
assumptions
• Outputs (forecasts) from Statistical Simulation defined for
Bias and Percent Standard Error
• CER coefficients defined as decision variables - Find
optimum coefficients that give minimum mean of standard
error under (near) zero bias constraint
• During optimization, random Variables are generated for x
and y variables - Examples that follow use 5000 trials
• CER Coefficients are tested for each set of (5000) trials
• OptQuest® determines optimum coefficients using scatter
search and tabu search techniques (does not find minima
via gradient approach)
08/10/09
© MCR, LLC 13
Optimizing Using Crystal Ball
and Premium Solver
• Model uncertain variables in spreadsheet using
statistical simulation tool such as Crystal Ball
• Random variables defined for uncertain variables that
constitute x,y data points
– Cost driver uncertainty
– NR/REC split
– Quantities (EDUs, Qual and Protoqual units)
– Inflation
• Outputs (forecasts) from Statistical Simulation are
defined
– Uncertain Input variables (cost drivers, quantities)
– Output variables (nonrecurring and recurring costs)
– Trial values (1000 trials) for each are dumped into a spreadsheet
• Data are regressed using constrained optimization
(Premium Solver)
– Uses a combined scatter search and gradient approach to find global
minimum for percent error under the constraint bias =0
– Produces coefficients for CER
08/10/09
© MCR, LLC 14
Agenda

• Introduction
• Errors-In-Variables Regression
• Sources of Uncertainty
• Examples
– CER Regression with Normalization Uncertainty
– CER Regression with Fuzzy Cost Drivers
– USCM EPS NR and REC CER Example
• Spacecraft Modeling Benefits and Drawbacks
• Summary

08/10/09
© MCR, LLC 15
Sources of Uncertainty

• Uncertainty in x, y variables can originate from:


– Assumptions of Normalization process
• A posteriori values of cost drivers (e.g., weight) are typically chosen
as best hardware cost drivers; however we do not know these a
priori values with certainty (weight is estimated at program start)
• Treatment of qualification units and EUs (Factor of T1 cost?)
• How much of total cost is NR vs. REC? We typically rely on
contractor inputs, guesses and assumptions
• Applying inflation (particularly to older data points) Should older
cost data be treated with more uncertainty? (Yes)
– Incomplete/inconsistent or otherwise “fuzzy” data
• How should we model parameters such as new design percentage?
– Combining data from multiple data sources/models
• All vendors treat cost data differently
• We can use Error-in-variables constrained optimization
to find coefficients for CERs with uncertain data by
accounting for uncertainty in the normalization process
08/10/09
© MCR, LLC 16
Cost Model Development Flow

Data CER
Data Collection Normalization Development CER Documentation
COST REPORT
ACOST
---- ---- REPORT
BA---- ----
COST----
---- REPORT
CB------------
A----
COST ----
----
REPORT Technology Regression
DC----
DB
A --------
B----
COST
C----
----
----
REPORT CER functions
--------
A---- ----
----
DB----
C ---- ----
---- and coefficients
DC----
---- ----
---- Inflation Statistics
D ---- ----
Data points
SCHED. REPORT
ASCHED.
---- ---- REPORT
BA----
---- ---- WBS Definition
SCHED.
CB----
A----
----
---- REPORT
----
----
SCHED.---- REPORT
Quantity
DC----
B---- ----
---- ----
A ------------ Normalization
DC----
---- ----
B ------------
D ----
C ------------ Assumptions
D ---- ---- Scope
Fit Statistics
DESIGN
REVIEW
DESIGN COST DRIVERS
Data Statistics
Filter

REVIEW
DESIGN A ---- ----
REVIEW
DESIGN B ---- ----
REVIEW C ---- ----
D ---- ----

Contract
“Riders”
Sources of Uncertainty (in red)
08/10/09
© MCR, LLC 17
Uncertainty in Data Normalization

Data • Filter: Decide what to include in cost


Normalization – Riders – things that are not relevant to the contract or
program
– Engineering & Cost Change Proposals
Technology • Scoping: Consistent definitions and content
– Re-allocation of data into WBS elements
– Need to determine where qualification units, prototype
Inflation units and protoflight units should be booked
• Quantity: Consistent units for regression
Quantity – Data will be for 1 unit, 100s of units, 10th unit, etc.
– Need to either use quantity as an input variable (QAIV)
or normalize to a base set of units using an assumed
Scope learning curve assumption
• Inflation: Consistent economic year
– Use a consistent set of inflation indices (e.g., DoD 3020
Filter

and 3600)
• Technology: Consistent technology maturity
Contract – Treat all data as if they were built in economic year of
“Riders” the model

08/10/09
© MCR, LLC 18
The EIV Problem
with “Fuzzy” Inputs

• New Design is defined by the following “fuzzy” variables


modeled as triangular probability distributions
Category New Design New Design % Low Most Likely High
1 None 0.1 0 0.1 0.2
2 Minor Mods 0.2 0.1 0.2 0.5
3 Moderate Mods 0.5 0.4 0.5 0.8
4 Major Mods 0.75 0.6 0.75 0.9
5 New Design 1 0.9 1 1

08/10/09
© MCR, LLC 19
Agenda

• Introduction
• Errors-In-Variables Regression
• Sources of Uncertainty
• Examples
– CER Regression with Normalization Uncertainty
– CER Regression with Fuzzy Cost Drivers
– Spacecraft EPS NR and REC CER Example
• EIV Modeling Benefits and Drawbacks
• Summary

08/10/09
© MCR, LLC 20
CER Development Example

• Military Fixed and Mobile Terminal Antennas


– Used for satellite communications

Antenna Slew Rate, New T1 Actual NR Actual


Program Diameter, m Frequency, GHz deg/sec Design Cost Cost Cost FY
1 3 2 0 1 45.59 2.28 1982
2 3 2 1.05 5 57.32 171.95 1985
3 5 2 0 2 50.73 25.37 1984
4 15 22 0 1 9,136.89 456.84 1992
5 20 22 1 3 10,458.80 10,458.80 1985
6 4 10 0 3 1,123.54 1,123.54 2000
7 3 12 0.95 5 1,850.14 5,550.43 1999
8 5 10 0.5 4 1,729.35 3,458.69 1988

• (Fictitious) Cost and technical data used to derive


theoretical first unit (T1) and nonrecurring (NR) cost
relationships
T1 Cost BY05$K = (a * Diamb * Freqc + d)*ε1 (ε1=error)
NR Cost BY05$K = (e * (x*T1)f * + g)*ε2 (ε2=error)

08/10/09
© MCR, LLC 21
Sources of Uncertainty in
CER-Development Example
“Then Year”
Cost Units
Fuzzy definition of
Different
Center frequency, “new design”
Fiscal Years
low, or high cutoff using ordinal scale
(midyear,
frequency? Assumed
Peak, average, peak year?)
learning
+ 0.5 m? or sustainable Inflation rates?
curve?
slew rate?

Antenna Slew Rate, New T1 Actual NR Actual


Program Diameter, m Frequency, GHz deg/sec Design Cost Cost Cost FY
1 3 2 0 1 45.59 2.28 1982
2 3 2 1.05 5 57.32 171.95 1985
3 5 2 0 2 50.73 25.37 1984
4 15 22 0 1 9,136.89 456.84 1992
5 20 22 1 3 10,458.80 10,458.80 1985
6 4 10 0 3 1,123.54 1,123.54 2000
7 3 12 0.95 5 1,850.14 5,550.43 1999
8 5 10 0.5 4 1,729.35 3,458.69 1988

08/10/09
© MCR, LLC 22
Determining CER Coefficient Values
Using Constant (x, y) Values

• Using Excel Solver:


– Solve for T1 CER coefficients using ZPB-MPE*

Antenna Frequency, Slew Rate, Actual T1 Est Cost (Act-Est)


Program Diameter, m GHz deg/sec Cost BY05$K BY05$K /Est
1 3 2 0 72.55 75.06 (0.03) testa 9.583852
2 3 2 1.05 85.85 75.06 0.14 testb 0.350121
3 5 2 0 77.55 86.33 (0.10) testc 2.030987
4 15 22 0 11,881.17 13,192.74 (0.10) testd 17.52339
5 20 22 1 15,666.08 14,588.92 0.07
6 4 10 0 1,242.96 1,689.85 (0.26)
7 3 12 0.95 2,088.57 2,207.27 (0.05)
8 5 10 0.5 2,438.03 1,825.74 0.34
Correl with Cost 0.9844 0.9192 0.2031 1.0000 % Bias 0.0000
Correlation with %Error 0.0318 -0.0454 0.5389 0.0457 %SE 18.24

– Minimize Percent Standard Error (%SE)


– Constraint: Bias = 0.00%
• T1BY05$K=[9.58*Diam^0.350 * Freq ^ 2.031)+17.52]*ε

* Zero percent bias, minimum percentage error


08/10/09
© MCR, LLC 23
Determining CER Coefficient Values
Using Uncertain (x, y) Values

• Model data as uncertain parameters in statistical simulation


by defining random variables for:
– Independent (x) variables:
• Antenna diameter: + 0.5m (uniform distribution)
-0.5, 0.49 • Frequency: Low and High cutoff frequencies (uniform
distribution)
– Dependent (y) Variable (cost):
f(low), f(high)

• Inflation: Base inflation rate with 1% Standard error (normal


distribution)
• Learning rate: Low=0.90, Most Likely=0.95 and High=1.00
0.9, 0.95, 1.0 (triangular distribution)
• Cost Fiscal Year: + 1 year (discrete distribution)
-1, 0, 1

• Solve for coefficients of the CER that provide:


– Minimized mean of percent standard error (which is now a
random variable)
– Near zero mean of bias (also a random variable) less than
+0.5%

08/10/09
© MCR, LLC 24
EIV Solution for T1 CER

• Solution converges after 2395 simulations


T1BY05$K=[8.64*Diam^0.64 * Freq ^ 1.74)+2.87]*ε
• ε2 = 27.9 % standard percent error
• Bias = +0.36%
Minimize
Objective Requirement% Bias-
Simulation %SEMean .5 <= Mean <= .5 testa testb testc testd
1 59.4571 -0.519265 - Infeasible 13.35 -0.37 2.51 -20
228 88.5124 -0.160674 20 -1.21 3 16.37
229 88.4783 -0.24651 20 -1.21 3 16.52
295 88.3915 -0.467954 20 -1.21 3 16.91
300 88.3827 -0.490526 20 -1.21 3 16.95
324 88.3806 -0.496165 20 -1.21 3 16.96
741 78.934 -0.309184 1 2 1.76 20
797 70.0248 0.108282 1 1.76 1.92 20
817 67.5865 -0.097794 1 1.69 1.97 20
821 65.4752 -0.192054 1 1.62 2.02 20
859 65.3195 0.490049 1 1.6 2.03 20
870 64.797 0.029595 1 1.59 2.04 20
903 64.2946 -0.423794 1 1.58 2.05 20
920 64.1649 0.268826 1 1.56 2.06 20
957 63.6943 -0.176813 1 1.55 2.07 20
963 62.7213 -0.343586 1 1.51 2.1 20
975 62.6332 0.369363 1 1.49 2.11 20
991 62.2457 -0.054359 1 1.48 2.12 20
1001 61.8773 -0.471419 1 1.47 2.13 20
1083 60.5676 0.131554 1 1.37 2.2 20
1119 60.328 -0.251383 1 1.36 2.21 20
1254 60.1309 0.12696 1 1.33 2.23 20
1257 59.9383 -0.243043 1 1.32 2.24 20
1276 59.8163 0.15858 1 1.29 2.26 20
1288 59.669 -0.198811 1 1.28 2.27 20
1308 59.6207 0.22575 1 1.25 2.29 20
1315 59.5172 -0.119347 1 1.24 2.3 20
1316 59.4287 -0.458659 1 1.23 2.31 20
Best: 2935 27.918 0.360542 8.64 0.64 1.74 2.87

08/10/09
© MCR, LLC 25
Actual vs. Estimated
Plot of T1 CER

08/10/09
© MCR, LLC 26
Contributors to Variance in
Dependent Variables

• Can use Crystal Ball sensitivity tool to see what


variables contribute to variance of %SE and %Bias

Frequency
errors

Learning
errors

Inflation
errors

Diameter
errors

08/10/09
© MCR, LLC 27
Agenda

• Introduction
• Errors-In-Variables Regression
• Sources of Uncertainty
• Examples
– CER Regression with Normalization Uncertainty
– CER Regression with Fuzzy Cost Drivers
– Spacecraft EPS NR and REC CER Example
• EIV Modeling Benefits and Drawbacks
• Summary

08/10/09
© MCR, LLC 28
Regression With Uncertain Variables:
Nonrecurring CER

• Some cost drivers are subjective values


(particularly in nonrecurring CERs):
– Amount of “new” design (0% to 100%)
– Complexity of development, manufacturing or testing
process
• Solve the ZPB-MPE problem under uncertainty
using Crystal Ball with OptQuest®
• Use our estimated T1 cost, T1EST (from last
regression) and percent new design (ND) as the
cost drivers
– ND is a “fuzzy” cost driver (it has a loose definition)
NR Cost BY05$K = [e * (T1EST*ND)f * + g]*ε2 (ε2
=error)
08/10/09
© MCR, LLC 29
Solving the EIV Problem
with “Fuzzy” Inputs

• Define new design categories as assumption variables


• Define coefficients e, f and g as decision variables
• Define % SE and % bias as forecast values
• Use OptQuest® to Minimize mean of % SE and
constrain mean of % bias to +/- 0.5%
NR Cost BY05$K = [e * (T1EST*ND)f * + g]*ε2

08/10/09
© MCR, LLC 30
EIV Solution for
Nonrecurring CER

• Solution converges after 1225 simulations


NR BY05$K = [2.34 * (T1EST*ND)0.99 * +0.27]*ε2
• ε2 = 72.6 % standard percent error
• Bias = -0.27%
Requirem
Minimize ent% NR
Objective Bias-.5 <=
Simulatio % NR Mean <=
n SEMean .5 teste testf testg
1 73.0909 -0.430346 2.33 0.99 1.31
122 73.0808 -0.404978 2.33 0.99 1.26
209 73.0707 -0.379393 2.33 0.99 1.21
210 73.0565 -0.343192 2.33 0.99 1.14
226 73.0443 -0.311793 2.33 0.99 1.08
229 73.0342 -0.285353 2.33 0.99 1.03
230 73.016 -0.237099 2.33 0.99 0.94
237 72.9979 -0.187932 2.33 0.99 0.85
238 72.9879 -0.160194 2.33 0.99 0.8
239 72.9781 -0.132133 2.33 0.99 0.75
245 72.94 -0.016298 2.33 0.99 0.55
246 72.931 0.013673 2.33 0.99 0.5
592 72.8831 0.418032 2.33 0.99 -0.09
1007 72.6643 0.091903 2.18 1 0.82
Best: 1225 72.5831 -0.269206 2.34 0.99 0.27

08/10/09
© MCR, LLC 31
Agenda

• Introduction
• Errors-In-Variables Regression
• Sources of Uncertainty
• Examples
– CER Regression with Normalization Uncertainty
– CER Regression with Fuzzy Cost Drivers
– Spacecraft EPS NR and REC CER Example
• EIV Modeling Benefits and Drawbacks
• Summary

08/10/09
© MCR, LLC 32
Spacecraft EPS NR and REC CER

• The problem: Find a set of NR and REC CERs for Electrical


Power System (EPS)
• Approach:
– Use USCM data set (24 Programs)
– Apply uncertainty to a posteriori values of cost drivers EPS
weight, Beginning of Life Power (BOLP) and Battery
Capacity (in Amp-Hours)
– Apply probabilistic bounds to US Department of Defense
(DoD) inflation indices using Consumer Price Index (CPI),
select DRI Indices and US Aerospace Contractors
– Abandon assumed learning rate of 95% to derive T1 cost
and use quantity as an independent variable (QAIV)
approach to develop REC CER [Ref. 3]
– Use EIV to find CERs with minimum percent error with zero
bias and select best cost driver from Weight, BOLP and
Battery Capacity

08/10/09
© MCR, LLC 33
Candidate Solutions

• Multiple solutions produced – Best driver is BOLP


– Many local minima found
– Coefficients seem to be correlated

08/10/09
© MCR, LLC 34
Relationship of
Candidate Coefficients

• Produced many viable candidate solutions


• Premium Solver could not find global minimum
• How can we find the global minimum the first time?
• Coefficients produced by each candidate seem to be
related to each other
• This information may be helpful in finding re-sampling
bounds
600.00 1.000
0.900
500.00
0.800
0.700
400.00
0.600
Coef b1

Coef b1

300.00 0.500
0.400
200.00
0.300
0.200
100.00
0.100
0.00 0.000
-1200 -1000 -800 -600 -400 -200 0 200 -1200 -1000 -800 -600 -400 -200 0 200
Coef a1 Coef a1

08/10/09
© MCR, LLC 35
Resulting REC CER

• Resulting EPS REC CER (minimum se, zero bias)


– Older data has wider x-axis
– Data with most uncertainty is de-weighted in
regression Normalized Actuals vs. Estimates
USCM REC EPS CER
REC=(49.4+342*BOLP^0.568Q^0.926)*err; s=0.423, bias =0

250,000

200,000
EIV Estimated Cost (FY06$K)

150,000

100,000

50,000

0
0 50,000 100,000 150,000 200,000 250,000
Normalized Actual Cost (FY06$K)
08/10/09
© MCR, LLC 36
Minimum NR+REC Problem

• The last regression provided the CER coefficients


providing the minimum percent error under the zero
bias constraint for the REC CER
• We can produce a NR CER using the same type
minimization criteria and constraints
• What happens when we try to find the coefficients
for the NR and REC CERs at the same time?
– This makes sense, since we are just going to add NR
and REC results when we use the CER
• How to approach this problem:
– Minimize the NR+REC percent error
– Constrain the total (NR+REC) bias to zero
• We produce two CERs with comparatively small
standard error but each has a bias
08/10/09
© MCR, LLC 37
EPS NR CER

• Total (NR+REC) percent SE = 43.9%, bias = 0.0%


• NR percent SE = 75.4%, bias = -181%
Normalized Actuals vs. Estimates
USCM NR EPS CER
NR=(46.5 + 206.41 *BOLP^0.662)*err
1,000,000
EIV Estimated Cost (FY06$K)

100,000

10,000

1,000
1,000 10,000 100,000 1,000,000
Normalized Actual Cost (FY06$K)
08/10/09
© MCR, LLC 38
EPS REC CER

• Total (NR+REC) percent SE = 43.9%, bias = 0.0%


• REC percent SE = 51.4%, bias = 408%

Normalized Actuals vs. Estimates


USCM REC EPS CER
NR=(2567.6 + 277.45 *BOLP^0.507 Q^1.231)*err
1,000,000
EIV Estimated Cost (FY06$K)

100,000

10,000

1,000
1,000 10,000 100,000 1,000,000
Normalized Actual Cost (FY06$K)

08/10/09
© MCR, LLC 39
New Questions Arise

• When we develop CERs independently we may be


producing biased total (NR+REC) results
• So, should we develop them in tandem in the
future?
• Why not regress the entire data set at once and
minimize the total NR+REC error for the spacecraft
bus?

• I don’t know the right answer, but we will certainly


be looking into these issues in the future

08/10/09
© MCR, LLC 40
Agenda

• Introduction
• Errors-In-Variables Regression
• Sources of Uncertainty
• Examples
– CER Regression with Normalization Uncertainty
– CER Regression with Fuzzy Cost Drivers
– Spacecraft EPS NR and REC CER Example
• EIV Modeling Benefits and Drawbacks
• Summary

08/10/09
© MCR, LLC 41
EIV Modeling
Benefits and Drawbacks

• Many benefits to using EIV over traditional methods of treating


data and regression
– More realistic treatment of uncertainty
– More realistic accounting of uncertainty in cost drivers
– Provides more accurate picture of CER uncertainty
• There are a few drawbacks
– Can be time consuming
• Spreadsheet preparation
– Choice between two search techniques
• Scatter search (time consuming)
• Local gradient search (do not find global minimum)
– Tool set needs to be better established
• Need to combine a Monte Carlo simulator with gradient search
with genetic algorithm (to re-seed the search and find global
minimum)

08/10/09
© MCR, LLC 42
Agenda

• Introduction
• Errors-In-Variables Regression
• Sources of Uncertainty
• Examples
– CER Regression with Normalization Uncertainty
– CER Regression with Fuzzy Cost Drivers
– Spacecraft EPS NR and REC CER Example
• EIV Modeling Benefits and Drawbacks
• Summary

08/10/09
© MCR, LLC 43
Summary

• Errors-in-Variables (EIV) regression assumes that


every variable can have error or noise
• Also known as Total Least Squares (TLS)
• Uncertainty in dependent and independent variables
due to incomplete data and assumptions in the
normalization process
• Need Statistical Simulation tool with Optimization
utility (Crystal Ball with OptQuest® or Premium
Solver)
• With EIV we can create more realistic CERs
• Need better tools: Monte Carlo simulator + gradient
search + genetic algorithm

08/10/09
© MCR, LLC 44
References

1. Covert, R., “Errors-In-Variables Regression”, Presented to the Joint


SSCAG/EACE/SCAF Meeting, September 19-21, 2006.
2. Foussier, P., “Palliating the Bias Introduced by Linear Regression”, ISPA
International Conference, Seattle WA, 23-26 May 2006.
3. Book, S. and Burgess, E., “A Way Out of the Learning-Rate Morass: Quantity
as an Independent Variable”, January 2003.

Further Reading:
• Quirino, P., "Robust Estimators of Errors-In-Variables Models Part 1"
(August 1, 2004), Department of Agricultural & Resource Economics (ARE),
University of California at Davis, ARE Working Papers, Paper 04-007.
• van Huffel, S.; Lemmerling, P. (Eds.), “Total Least Squares and Errors-in-
Variables Modeling: Analysis, Algorithms and Applications,” Springer
Verlag, 2002, ISBN: 1-4020-0476-1.
• Griliches, Z., "Errors in Variables and Other Unobservables," Econometrica,
Econometric Society, vol. 42(6), pages 971-98, November 1974.
• Pollock, D.S.G., “Topics in Econometrics: the Errors in Variables Model and
the Linear Regression Model”, unpublished notes, p. 1-4,
http://www.qmw.ac.uk/~ugte133/courses/mesomet/topics/ectopics.htm

08/10/09
© MCR, LLC 45

You might also like