You are on page 1of 0

Deloitte Consulting

Advanced Analytics
Group Presents:
Supply Chain Analytics Unit 1 Workbook

Contents
Welcome to Supply Chain Analytics Unit 1 ............................................................ 1
How to Use this Workbook ..................................................................................... 2
Section 1 Fundamentals of Operations Research (Part I) .................................. 3
Section 2 Network Problems (Part I) ................................................................... 8
Section 3 Applied Statistics (Part I) ................................................................... 17
Section 4 Fundamentals of Operations Research (Part II) ............................... 34
Section 5 Network Problems (Part II) ................................................................ 37
Section 6 Applied Statistics (Part II) .................................................................. 44
Solutions ............................................................................................................... 61


Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
1
Welcome to Supply Chain
Analytics Unit 1
One of Deloittes top priorities is to support the development of skills and
knowledge that enable practitioners to provide the highest level of client service.
In support of this objective, Deloittes Advanced Analytics Group (DAAG) created
a set of courses and learning materials to expand the client service and technical
capabilities of practitioners interested in Supply Chain Analytics.
Supply Chain Analytics Unit 1 is comprised of six courses that serve as
prerequisites for Unit 2. Unit 2 introduces advanced topics in Supply Chain
Analytics such as Network, Inventory and Transport Optimization. Unit 1 provides
a the foundation and knowledge needed to solve business problems outlined in
Unit 2. The Unit 1 courses should be taken in the order they are presented.
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
2
How to Use this Workbook
This workbook is designed to support the Unit 1 Supply Chain Analytics training
and to provide tools and information needed to support the training. This
workbook will:
Summarize key learning objectives
Provide an opportunity for reflection and a framework for understanding what
can occur on client engagements
Provide application based activities to embed learning and make it practical
Point to resources and tools that will assist in applying learning objectives
As you proceed through Unit 1, have this workbook available to complete all of
the activities and maximize the impact of the learning. The Course Information
and Activities section has suggested activities to help you apply what you are
learning and prepare you for Unit 2.
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
3
Section 1 Fundamentals
of Operations Research
(Part I)
Basic Concepts of Linear Programming
Overview of Linear Programming
Linear Programming is a technique which is used to arrive at an optimal decision,
which is affected by various factors and constraints. Linear Programming
problems consist of two parts: Objective Function and Constraints.
An objective function can be maximized or minimized.
Constraints are usually in the form of inequalities. Constraints exist because
certain limitations restrict the range of a variables possible values.
Approach to Problem Solving
Identify the objective of the problem
Identify the decision variables and constraints on them
Write the objective function and constraints in terms of the decision variables
Add any implicit constraints
Arrange the equations into an organized format
Assumptions in Linearity
Proportionality
Additivity
Divisibility
Certainty
Exercise
Question 1.1: A diet is to contain at least 200 grams of carbohydrates, 100
grams of fat and 150 grams of protein. Two foods A and B are available. Food A
costs $2 per pound and food B costs $4 per pound. A pound of food A contains
10 grams of carbohydrates, 20 grams of fat and 15 grams of protein. A pound of
food B contains 25 grams of carbohydrates, 10 grams of fat and 20 grams of
protein. Formulate the problem as a Linear Programming problem so as to find
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
4
the minimum cost for a diet that consists of a mixture of these two foods and also
meets the minimum requirements.
Food Type Carbohydrates Fat Protein
Cost ($) per
gram
A 10 20 15 2
B 25 10 20 4
Requirement 200 100 150






Review the correct answer in the Solutions section.

Exercise Reflection: Use the space below to what you have learned about
solving the previous Linear Programming problem.
________________________________________________________________
________________________________________________________________

The general form for maximized objective function and constraints in Linear
Programming is represented as follows.

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
5

Linear Programming Optimization Methods
Graphical Method of Solution
Graphical method is a simple way to solve Linear
Programming problems when there are two decision
variables x1, and x2. We usually take these decision
variables as x, y instead of x1, x2.
The graphical method includes two major steps:
The determination of the solution space that defines
the feasible region
The determination of the optimal solution from the
feasible region

Defining the Feasible Region


The following three steps are used to determine the feasible solution of a
Linear Programming problem:
1. Since the two decision variables x and y are non-negative, consider only the
first quadrant of the xy-plane
2. Draw the line for each constraint
Each line divides the first quadrant into two regions
Area under constraint 1: All the points in this area satisfy the equation 3x + 4y
12
Area under constraint 2: All the points in this area satisfy the equation 5x + 3y
15
3. Each point within the feasible solution meets all the constraints
Thus, the intersection of the two areas is the feasible area or feasible solution
of the Linear Programming problem.

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
6

Optimal Solution
The optimal solution to a Linear Programming problem occurs at
the corners of the feasible region. Another way to reach the
optimal solution is to plot the objective function for some arbitrary
value, like 6x + 5y = 12. Since we want to maximize 6x + 5y, we
plot another line for 6x + 5y = 20.
This line is parallel to the first line and is moving in the direction
of increase of the objective function line. If we want to maximize
6x + 5y, then we move it in the increasing direction
We can move the line until it comes out of the feasible region.
The last point it will touch before it leaves the feasible region is
the corner point (2,3)
This point is the feasible point that has the highest value of the
objective function and is optimal
Exercise
Question 1.2: Using the graphical method of solution of a Linear Programming
problem, find the feasible solution for the problem of a decorative item dealer
whose Linear Programming problem is to maximize profit function.

Objective Function: Z = 50x + 18y
Constraints:
2x + y 100
x + y 80
x 0, y 0









Review the correct answer in the Solutions section.

Exercise Reflection: Use the space below to note the important things you have
learned about solving the previous Linear Programming problem using the
graphical method.
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
7
________________________________________________________________
________________________________________________________________
Business Applications
Linear programming is used to facilitate decision-making in business when there
are multiple trade-offs involved and an optimal outcome needs to be arrived at in
the face of various conditions. While it has roots in operations and supply chain,
it has applications across business functions/service lines.
Marketing
Application
Financial
Applications
Production
Management
Product-Mix
Application
Media Selection
Market Research
Portfolio Selection
Financial Planning
A Make-or-Buy
Decision
Production
Scheduling
Workforce
Assignment
SKU Rationalization
Blending Problems


Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
8
Section 2 Network
Problems (Part I)
Overview of Network Problems
Fundamentals of Network Flow Problems
Network Flow Problems are applied to business issues that can be formulated in
a network structure with nodes and arcs, and solved using special purpose
algorithms.
Common Terms
Node
Specific location in a network that can be of various types such as origin, destination,
and transshipment nodes.
Arc
Connector of two nodes, and the path between nodes along which materials move.
Arcs can be one-way or two-way in nature.
Flow Movement of materials / resources between nodes along an arc.
Capacity
Limitations on the amount of materials that can flow through an arc. Arcs can
possess both lower and upper capacity constraints.
Business Applications of Network Flow Problems
Distribution and transportation systems
Telecommunication networks
Oil & gas
Aerospace
Manufacturing
Telecommunications
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
9
Illustrative Examples of Business Applications of Network
Flow Problems
Applications
Sample Business
Application
Physical
Analog of
Nodes
Physical
Analog of
Arcs
Flow
Distribution
Networks
What quantity of goods
should be sent from
which plant given
demand at a distribution
center (DC)?
Plants,
Distribution
Centers,
Warehouses
Road, Rail and
Air Routes
Materials,
Goods, Finished
Products
Transportation
Systems
What is the maximal
number of vehicles that
can be routed through a
road system?
Intersections,
Airports, Rail
Yards
Highways,
Airline Routes,
Railbeds
Passengers,
Freight,
Vehicles,
Operators
Manufacturing
Scheduling
What is the optimal
assignment of jobs to
machines?
Machines,
Jobs
Processing Time Assignment and
Sequencing of
Jobs
Overview of Solution Methods
Solution Methods for Network Flow Problems
Common network flow problems can be solved primarily using three methods:
Rule-Based Algorithms
Problem-specific, optimal, less flexible
Linear Programming-Based
Optimization
More flexible, time-consuming, commercial solver tool-
based
Heuristics
Easy, but may be sub-optimal

Considerations for Choice of Solution Method
Solution Driven Considerations Resource Driven Considerations
Problem Size
Problem Complexity
Desired Accuracy Levels
Impact of Assumptions
Availability of Solvers
Availability of Trained Resources
Cost Implications
Available Time

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
10
Shortest Path Problem
Overview of the Shortest Path Problem
The Shortest Path Problem is a network problem with the primary objective of
finding the shortest route between any pair of nodes in a network. There are
multiple forms of this problem, and most forms have corresponding specific
algorithms that are more efficient than the standard algorithm.
Decision: Which arcs to travel on?
Objective: Minimize the distance (or time) from the origin to the destination.

Dijkstra's Algorithm Standard Form
Step 1
Assign a permanent label [0,S] to the starting node (Node 1) (0 indicates the distance
from the node to itself, and S indicates that it is the starting node)
Step 2
Assign tentative labels to the nodes that can be reached directly from Node 1 (In a
label, the first number is the direct distance from Node 1, and the second number is the
preceding node in the route from Node 1)
Step 3
Identify the tentatively labeled node with the shortest distance value, and declare that
node permanently labeled
If all nodes are permanently labeled, go to step 5
Step 4
For each non-permanently labeled node that can be reached from the new permanently
labeled node:
If a node has a tentative label, calculate the shortest distance from Node 1 through the
new permanently labeled node. If this is less than the existing distance, reset and
permanently label the node. Go to step 3
If the node is not yet labeled, create a tentative label indicating the shortest distance
from Node 1 through the new permanently labeled node. Go to step 3
Step 5
The permanent labels identify the shortest route from Node 1 to the respective node,
and the preceding node in the shortest route
To find the shortest route to Node 1, work backwards along preceding nodes until Node
1 is reached

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
11
Linear Programming Method Standard Form
xij = binary variable indicating whether the arc between the ith and jth nodes is
chosen
cij = The distance or length of arc (i,j)

Variations to Shortest Path Problem
Single-Source Shortest Path Problem
Single-Destination Shortest Path Problem
All-Pairs Shortest Path Problem
Sample List of Algorithms for Shortest
Path Problem
Business Applications of the Shortest
Path Problem
Dijkstra's Algorithm
Bellman-Ford Algorithm
A* Search Algorithm
Floyd-Warshall Algorithm
Johnson's Algorithm
Perturbation Theory
The Shortest Path Algorithms determine the
path with the least weight (weight can be cost,
distance, time, etc.) between any pair of nodes
in a network. Some business applications of
these algorithms include:
Flight reservations
Internet packet routing
Driving directions
Telecom network routing

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
12
Exercise
Question 2.1: Choose the arcs to travel on such that the distance between node
1 and node 8 is minimized.

Review the correct answer in the Solutions section.
Exercise Reflection: Use the space below to note the important things you have
learned about solving the previous Shortest Path problem.
________________________________________________________________
________________________________________________________________
Minimum Spanning Tree Problem
Overview of the Minimum Spanning Tree Problem
The Minimum Spanning Tree Problem is typically used in a given network to
connect all the nodes of the network such that the total weight of all the arcs
used to achieve this objective is minimized. A Minimum Spanning Tree will
provide the optimal set of arcs with minimal total arc cost, time, distance or other
similar measure.
Decision: Which arcs to choose such that all nodes are connected to the
network?
Objective: Minimize the total weight of the arcs chosen.
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
13
Linear Programming Method Standard Form
To solve this problem, the network is divided into all possible combinations of two
subsets of the network such that each set of subsets together makes up the total
network.
xij = The arc between the ith and jth nodes in a network of n nodes
cij = The distance or length of arc (i,j)
A = Every possible subset of nodes within the network
B = Complement of A


Note: The optimal solution will use (n-1) arcs to connect a network of n nodes.
Using more than (n-1) arcs will potentially result in redundant arcs and/or the
formation of loops.
Variations to the Minimum Spanning Tree Problem
Optimum Communication Spanning Tree
Steiner Trees
Sample List of Algorithms
for Minimum Spanning Tree
Problem
Business Applications of the Minimum Spanning
Tree Problem
Prims Algorithm
Kruskals Algorithm
Boruvkas Algorithm
Reverse-Delete Algorithm
Edmonds Algorithm
The Minimum Spanning tree problem is used to determine the
smallest spanning tree that is needed to connect a set of nodes
in a network. The typical variables include distance, cost, time,
etc. Some business applications of this problem include:
Design of telecommunications networks
Airline routing
Design of lightly used transportation network to minimize the
total cost of providing the links
Finding routes with maximum bottleneck capacity in a
computer network
Network design of high voltage electrical transmission lines
Exercise: Identify the Minimum Spanning Tree for the below given sample
problem.
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
14
Question 2.2: Fly High Airlines wants to establish connectivity to all the major
ports in the country leveraging the shortest distance route. Connect all the ports
in the network such that the overall distance of the network is minimized.

Review the correct answer in the Solutions section.
Exercise Reflection: Use the space below to note the important things you have
learned about solving the previous Shortest Path problem.
________________________________________________________________
________________________________________________________________
Maximal Flow Problem
Overview of the Maximal Flow Problem
The Maximal Flow Problem is used to determine the maximum amount of flow of
a given item (vehicles, fluid, materials, etc.) that can enter and exit a network in a
specific period of time. Flow is transmitted through each node in the network as
efficiently as possible. Typically, each arc is subject to certain flow restrictions
(vehicles per hour, gallons per hour) and the maximum capacity restriction is
referred to as the flow capacity for that arc. In its simplest form, it is assumed that
for each node, inflow to the node is equal to the outflow from the node (no
inventory). In this case, capacity restrictions are not assigned to the nodes.
Decision: How much flow on each arc?
Objective: Maximize flow through the network from an origin to a destination.
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
15

Linear Programming Method Standard Form
To solve this problem, add a new arc from node n (output node) back to Node 1
(input node). This arc denotes the total flow over the route network. The flow over
this arc must be maximized. Each variable is associated with each arc that
represents the quantity of flow through that arc, and there is a constraint for flow
through each node.

xij = The flow across arc from the ith to the jth node.
uij = Maximal capacity on arc from the ith to the jth node.

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
16
Variations to Maximal Flow Problem
Capacity Constraints
Max-Flow Min-Cut Theorem
Sample List of Algorithms for
Maximal Flow Problem
Business Applications of the Maximal Flow
Problem
Ford Fulkerson Algorithm
Edmonds-Karp Algorithm
Dinitz Blocking Flow Algorithm
General Push-Relabel Maximum
Flow Algorithm

The Maximal Flow Problem can be used to determine the
optimal flow of materials (such as vehicles, oil, etc.) through
each arc of a given network such that the amount of flow
through the entire network is maximized. Some business
applications of this problem include:
Oil flow through a pipeline network
Project selection
Airline scheduling
Material flow through a companys distribution network
Water supply through a system of aqueducts
Exercise
Question 2.3: The local water conservation authority is constructing new ducts for
water supple in the city. The capacity of each duct is provided in the below given
network representation. Choose the route to maximize the water flow from node
1 to node 9.







Review the correct answer in the Solutions section.
Exercise Reflection: Use the space below to note the important things you have
learned about solving the previous Shortest Path problem.
________________________________________________________________
________________________________________________________________
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
17
Section 3 Applied
Statistics (Part I)
Statistical Tools
Statistical Tools
There are several statistical tools and packages that are commercially available
to solve statistics problems. Even common software like Microsoft Excel have
Add-Ins with significant statistical capabilities.
Using Statistical Tools
Primary tools used to solve the different statistical problems:
Course Topic Primary Tools
Simple Regression and Correlation MS Excel, SPSS, SAS, Systat
Multiple Regression and Correlation MS Excel, SPSS, SAS, Systat
Time Series Analysis and Forecasting SPSS, SAS, Systat
Discriminant and Logit Analysis SPSS, SAS, Systat
Factor Analysis and Clustering SPSS, SAS, Systat

Key analysis supported by Analysis ToolPak:
Regression
Sampling
Rank and percentile
t-Test: Two Sample for Means
Correlation
Covariance
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
18
Probability Distributions
Random Variables
A variable is random if it assumes different values as a result of the outcome of a
random experiment, for example, a coin toss.
There are two types of random variables:
Discrete Random Variable Continuous Random Variable
A discrete random variable is one for which the
number of possible outcomes can be counted,
and for each possible outcome, there is a
measurable and positive probability.
Example: Number of days it rains in a given
month, number of patients visiting a clinic on
each day of the previous week.
A continuous random variable is one for which
the number of possible outcomes is infinite, even
if lower and upper bounds exist.
Example: The actual amount of daily rainfall
between zero and 10 inches is an example of a
continuous random variable because the actual
amount of rainfall can take on an infinite number
of values.
Expected Value of a Random Variable
The expected value of a random variable can be obtained by multiplying each
value that the random variable can assume with the probability of occurrence of
that value, and then adding up all these products.
Expected Value of a Random Variable E(x) = x1P1 + x2P2+..+.+xnPn
x = Value of the Random Variable
P = Probability of Occurrence of that Value
n = A numeric integer from 1 to infinity
Exercise
Question 3.1: Suppose Jim goes to two movies 10% of all weekends, he goes to
one movie 40% of the time, and he goes to no movies 50% of the time. What is
the expected value for the number of movies he goes to during a weekend?








Review the correct answer in the Solutions section.

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
19
Exercise Reflection: Use the space below to note the important things you have
learned about solving the previous Binomial Distributions problem.
________________________________________________________________
________________________________________________________________
Probability Distributions
Probability distributions arise from experiments where the outcome is subject to
chance. A Probability Distribution describes the probabilities of all the possible
outcomes for a random variable, such as getting tails on the toss of a coin or the
probability that a call center representative will convert a sale on a given call.
Characteristics
The probability of all possible outcomes must sum to one
It is a listing of the probabilities of all the outcomes that could result if an
experiment was conducted
Example
A simple Probability Distribution is that for the roll of one fair die, there are six
possible outcomes and each one has a probability of 1/6, so they sum to one.
The Probability Distribution of all the possible returns on the S&P index is a more
complex version of the same idea.
A frequency distribution is different from a Probability Distribution. Frequency
distribution is the process of listing all the observed frequencies of all outcomes
in an experiment while it was conducted. A Probability Distribution is a listing of
the probabilities of all the outcomes that could result if the experiment was
conducted.
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
20
Types of Probability Distributions
The nature of the experiment dictates which Probability Distribution may be
appropriate for modeling the resulting random outcomes.
There are two types of probability distributions:
Discrete Probability Distribution
(Appropriate for discrete random
variables)
Continuous Probability Distribution
(Used for continuous random variables)
Discrete Probability Distributions can assume
only certain outcomes. The outcomes are
mutually exclusive.
Examples:
The number of students in a class
The number of children in a family
The number of cars entering a carwash in a hour
Number of home mortgages approved by
Coastal Federal Bank last week
Continuous Probability Distributions can assume
an infinite number of values within a given range.
Examples:
The distance students travel to class
The time it takes an executive to drive to work
The length of an afternoon nap
The length of time of a particular phone call

Types of Discrete Probability Distributions
Binomial: Binomial Distributions describe discrete data resulting from an
experiment known as the Bernoulli Process.
Poisson: Poisson Distributions express the probability of a number of events
occurring in a fixed period of time if these events occur with a known average
rate and independent of the time since the last event.
Binomial Distributions
Standard Formula for Binomial Distributions

Where the following standard notations apply
p(r) = Probability of r successes in n trials
p = Characteristic probability or probability of success
q = 1 p = Probability of failure
r = Number of successes desired
n = Number of trials undertaken
= Population mean
= Standard deviation
! denotes factorial; 5! = 5*4*3*2*1 = 120

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
21
Exercise
Question 3.2: The probability of converting a sale on any given call for an
outbound call center representative is 0.6. If the representative takes 6 calls per
hour, what is the probability that he/she will convert exactly 2 sales?
Solution: Apply the binomial formula just discussed with the following values:
n = 6
r = 1, 2, 3,.,6
p = 0.6
q = 0.4

The Binomial Distributions for a variety of situations can be calculated in this
manner and are illustrated below.

Reflective Question: What is the probability that he/she will convert up to 5
sales per hour?










Exercise Reflection: Use the space below to note the important things you have
learned about solving problems using Binomial Distributions.
________________________________________________________________
________________________________________________________________
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
22
Poisson Distributions
Standard Formula for Poisson Distributions

f(x) = Probability of x occurrences in an interval
= Mean number of occurrences in an interval
e = 2.71828
= Population mean
= Standard deviation

Types of Continuous Probability Distributions - Normal
Distributions
A Probability Distribution is called continuous if its cumulative distribution function
is continuous.
Description
A Normal Distribution, also known as the Gaussian distribution, describes
continuous data where the random variable can assume any value within a
given range, and the Probability Distribution is continuous
The Normal Distribution is very important in statistics as it has
properties that make it applicable to a wide variety of situations,
and it comes close to matching the observed frequency
distributions of many phenomena
The areas under the curve represent probabilities, and the total
area under the normal curve is 1.00
As the tails never reach the horizontal axis the theoretical
model can assign impossible empirical values, but not much
accuracy is lost by ignoring values far out in the tails
Although the Normal Distribution is continuous, it can be used
to approximate discrete distributions whenever np and nq are
at least 5
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
23
Characteristics

The curve is bell-shaped and has a single peak (unimodal)
The mean of the normally distributed population lies at the center of the normal
curve
Due to its symmetry, the mean, median and mode are of the same value
The tails of the Normal Distribution extend indefinitely and never touch the
horizontal axis
Standard Deviation

The areas under the curve represent probabilities, and the total area under the
normal curve is 1.00. It can be noted that:
Approximately 68% of the values in a normally distributed population lie within
+/- 1 standard deviation from the mean
Approximately 95.5% of the values in a normally distributed population lie
within +/- 2 standard deviation from the mean
Approximately 99.7% of the values in a normally distributed population lie
within +/- 3 standard deviation from the mean
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
24
Testing for Normality


Graphical Method:
Comparing the histogram plotted for all residuals (error terms) to a normal
curve is a quick test of normality of data
The normal probability plot is a formal graphical tool to confirm normality. In a
normal probability plot, the data is plotted against a theoretical Normal
Distribution in such a way that the points should form an approximate straight
line. Departures from this straight line indicate departures from normality
Other rigorous statistical methods used to test for normality include, Pearsons
Chi-Square Test, Anderson-Darling Test, and Shapiro-Wilk Test
When removing data that lies two to three standard deviations from the mean,
always go back and verify that other metrics (spend, revenue, etc.) are not
disproportionately affected or reduced
Testing data for normality is critical since assuming data distribution is normal
and including only +2 or +3 may lead to exclusion of important data points.
Other Common Probability Distributions
Distribution Description
Continuous Uniform Distribution:

Continuous Uniform
Distribution [U(a,b)], is a
family of Probability
Distributions such that for
each member of the family,
all intervals of the same
length on the distribution's
support are equally
probable
Probability Density
Function: f(x) = 1 / (b-a)
for a< x < b ; 0 for x > a or x
< b
Population Mean = (a + b)
/ 2
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
25
Distribution Description
Variance = (b a)
2
/ 12
Standard Deviation = (b
a) / 12
One of the most common
applications of this
distribution is to generate
random numbers
Exponential Distribution:

Exponential
Distribution represents
a process in which
events occuring
continuously and
independently at a
constant average rate
Probability Density
Function:
f(x) = e
- x
for x > 0
= 0 for x < 0
where is the parameter of
distribution called rate
parameter and >0
Population Mean = 1 /

Variance = 1 /
2

Standard Deviation = 1
/
Service times of bank
tellers, call center
agents etc. may be
modeled as
Exponential
Distributions. Other
applications include
situations where
certain events occur
with a constant
probability per unit
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
26
Distribution Description
length
Gamma Distribution

Gamma Distribution is
a two-parameter family
of continuous
Probability
Distributions. It has a
scale parameter and
a shape parameter k
Probability Density
Function: f(x;k,) =
(x
k-1
*e
-x/
) /
k
(k) for
x > 0 and k, > 0
Population Mean = k
Variance = k
2

Standard Deviation =
k
The Gamma
Distribution is
frequently used to
model waiting times;
for instance, in life
testing, the waiting
time until death is a
random variable which
is frequently modeled
with a Gamma
Distribution
Students t Distribution Student's t-Distribution
(or simply the t-
distribution) is a
Probability Distribution
used to model normally
distributed population
when the sample size
is small
Probability Density
Function:
f(x) = (+1)/2___ * ( 1 + t2/()
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
27
Distribution Description

(

+1)/2

()(/2)
where is the number of
degrees of freedom and is the
gamma function
Population Mean = 0
for > 1, otherwise
undefined
Variance = / ( - 2)
for > 2, otherwise
undefined
Standard Deviation =
[ / ( - 2)] for > 2,
otherwise undefined
Students t-Distribution
is used when
population standard
deviation is required to
be estimated from the
data
Sampling Techniques
Overview
Sampling is the part of statistical practice concerned with the selection of
individual observations intended to yield knowledge about a population of
concern, especially for the purposes of statistical inference.
The stages of the sampling process are:
Defining the population of concern
Specifying a sampling frame, a set of items or possible events to measure
Specifying a sampling method for selecting items or events from the frame
Determining the sample size
Implementing the sampling plan
Sampling and data collecting
Reviewing the sampling process

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
28
Central Limit Theorem
The Central Limit Theorem states that the sampling distribution of the mean
approaches normality as the sample size increases.
This relationship between the shape of a Population Distribution and the shape
of the sampling distribution of the mean is called the Central Limit Theorem
The importance of this theorem is that it permits us to use sample statistics to
make inferences about population parameters without knowing anything about
the nature of the distribution for that population other than what we can get
from the sample
The charts below illustrate that the distribution of sample means reach normality
as the sample size increases. Since we know the Normal Distribution
characteristics, which are described by just two parameters (mean and standard
deviation), we can now better estimate the characteristics of the entire
population.
n = 1


n = 5


Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
29
n = 10


n = 25

Types of Sampling Techniques
There are two types of sampling techniques:
Judgment Sampling
Random Sampling
Methods of Random Sampling
Simple Random Sampling
Systematic Sampling
Stratified Sampling
Cluster Sampling
Common examples of sampling bias are:
Data Mining Bias
Sample Selection Bias
Survivorship Bias
Look-Ahead Bias
Time Period Bias
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
30

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
31

Hypothesis Testing
Description of Hypotheses Testing
Hypothesis testing is a method of making statistical decisions using experimental
data. It decides whether experimental results contain enough information to cast
doubt on conventional wisdom.
Steps in Hypothesis Testing
Begin with an assumption or a hypothesis that is made about a population
parameter
Collect sample data and conduct statistical analysis for the sample, which is
then used to determine the likelihood that the hypothesized population
parameter is correct
Key Concepts
Null Hypothesis: This is the statement of the assumed or hypothesized value of
the population parameter before we begin sampling. This assumption is called
the null hypothesis and is denoted by H0. Null hypothesis is the default,
conservative assumption. The test is trying to see if the data sufficiently proves
the alternate.
Alternate Hypothesis: Whenever the null hypothesis is rejected, the conclusion
that is accepted is called the alternate hypothesis and is denoted by HA/ H1
Significance Level: It is defined as the probability of making a decision to reject
the null hypothesis when the null hypothesis is actually true (i.e. a false
negative)
Two-Tailed and One-Tailed Tests: A two-tailed test will reject the null
hypothesis if the sample mean is significantly higher or lower than the
hypothesized population mean (rejection). This can be contrasted with the one-
tailed test where there is only one rejected area
Standard Error: The standard error of a method of measurement or estimation
is the standard deviation of the sampling distribution associated with the
estimation method
Five-Step Process for Hypotheses Testing
Step 1:
State your hypotheses. Decide whether this is a two-tailed or one-tailed test. Select a
level of significance appropriate for this decision.
Step 2:
Decide which distribution (t or z) is appropriate (from the table below) and find the critical
values for the chosen level of significance from the appropriate table.
Step 3:
Calculate the standard error of the sample statistic. Use the standard error to convert the
observed value of the sample statistic to a standardized value.
Step 4:
Sketch the distribution and mark the position of the standardized sample value and the
critical values for the test.
Step 5:
Compare the value of the standardized sample statistic with the critical values for this test
and interpret the result.
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
32

Decision Table for Distribution Selection

Exercise
Question 3.4: Fizz-O, a leading cola manufacturing and distribution company, is
considering expanding its operations in New Jersey and Delaware. As a part of
developing its expansion strategy, Fizz-O wants to establish if the average
annual consumption of cola in these two states is different from that of the entire
US. Fizz-Os marketing team has already conducted a survey across 400 people
(identified using random sampling) in each of the two states, and determined the
state-wise cola consumption levels in NJ, sample average = 1.6 gallons/year
and 2.0 gallons/year in DE,. It is known that the average cola consumption
across the US is 1.2 gallons/year with standard deviation = 6.
Solution:
Step 1: State your hypotheses and decide whether this is a two-tailed or one-
tailed test.
Define the Hypothesis:

Decide Test:

This is a two-tailed test because our business decision is impacted if the state-
level annual average consumption is either higher than or lower than the national
annual average consumption.
Reflective Question: What will be the appropriate level of significance for this
decision?

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
33











Exercise Reflection: Use the space below to note the important things you have
learned about solving the previous Hypothesis testing problem.
________________________________________________________________
________________________________________________________________
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
34

Section 4 Fundamentals
of Operations Research
(Part II)
Basic Concepts of Integer Programming
Integer Programming Concepts
When all the variables are integers, the integer program is called All Integer
Program. When some, but, not all of the variables are integers, the integer
program is called a Mixed Integer Program.
In many applications of Integer Programming, one or more integer variables are
required to equal either 0 or 1. Such variables are called binary variables. If all
variables are 0-1 variables, it is a 0-1 Integer Program.
The Linear Program that results from dropping the integer requirements is called
the Linear Program Relaxation of the Integer Program.
Cost of Production
In many fixed cost applications, the cost of production has two components:
Set up, which is a fixed cost
Variable Cost, which is directly related to the production quantity
Set up cost is included in a model for a production application using binary
variables (1 to produce, 0 not to produce).
Exercise
Question 4.1: Three raw materials are used to produce three products (in tons):
a fuel additive, a solvent base, and a laundry detergent.

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
35
The company has 20 tons of Material A, 5 tons of Material B, and 21 tons of
Material C, and is interested in determining the optimal production quantities for
the upcoming planning period.

Solution
Step 1: Formulate the Linear Program


Step 2: Conversion to Integer Programming Form


Reflective Question: What will be the final Cost Model for the problem?








Exercise Reflection: Use the space below to note the important things you have
learned about solving problems using the Integer Programming model.
________________________________________________________________
________________________________________________________________

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
36

Sensitivity Analysis
Sensitivity Analysis
Sensitivity analysis is the study of how the changes in the coefficients of a Linear
Program affect the optimal solution.
Optimization Using Excel Solver
Excel Solver is a Linear Programming solving option used by Microsoft Excel.
You can install Microsoft Excel Solver by selecting the Microsoft Office Button >
Excel Options > Add-Ins > Solver Add-In.
Some of the other integer linear programs software packages available on the
market are:
MPSX MIP
OSL
CPLEX
LINDO
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
37

Section 5 Network
Problems (Part II)
Minimum Cost Flow Problem
Overview of the Minimum Cost Flow Problem
The Minimum Cost Flow Problem is used to send flow from a set of supply nodes
to a set of demand nodes through the arcs of a network, at minimum total cost,
and without violating the lower and upper bounds on flows through the arcs. This
problem is used for moving only one product / commodity at a time.
Decision: Which arcs are to be used, given the lower and upper bounds or each
arc?
Objective: Minimize total cost


For each arc:
x = Cost of transportation per unit
y = Lower capacity constraint
z = Upper capacity constraint

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
38

For each supply node:
[a = Available supply of commodity X]

For each demand node:
[b = Demand for commodity X]

Linear Programming Method Standard Form
i = index for origins, i = 1, 2, 3m ; j = Index for destinations, j = 1, 2, 3n
cij = Cost per unit shipped from origin i to destination j; si = Supply or capacity in
units at origin i
dj = Demand in units at destination j; lij = Lower bound on the flow from origin i to
destination j
uij = Capacity on the flow from origin i to destination j
xij = Number of units shipped from origin i to destination j, where xij is only
defined for arcs that exist in the network


Variations to Minimum Cost Flow Problem
Assignment Problem
Transshipment Problem
Transportation Problem
Shortest Path Problem
Maximal Flow Problem
Unbalanced Minimum Cost Flow Problems
Sample List of Algorithms for Minimum Cost Flow Problem
Negative Cycle Algorithm
Successive Shortest Path Algorithm
Primal-Dual Algorithm
Out-of-Kilter Algorithm

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
39

Transportation Problem
Overview of the Transportation Problem
The Transportation Problem can be used to minimize the cost of shipping goods
from multiple origins to multiple destinations. It is typically used in distribution
planning, where the quantity of goods available at a supply location is limited,
and the quantity of goods required at each demand location is known. This is a
more specific form of the Minimum Cost Flow problem.
Decision: How much to ship along each arc between any origin and destination?
Objective: Minimize shipping cost.


Linear Programming Method Standard Form
i = Index for origins, i = 1, 2, 3.m
j = Index for destinations, j = 1, 2, 3.n
xij = Number of units shipped from origin i to destination j
cij = Cost per unit shipped from origin i to destination j
si = Supply or capacity in units at origin i
dj = Demand in units at destination j

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
40

Variations to Transportation Problem
Assignment Problem:
All supply and demand values equal 1 and the amount shipped over each arc
is either 0 or 1
Primarily used for assignment of resources to specific tasks such as project
staffing in large corporations and deployment of armed forces personnel
Supply vs. Demand Problem:
Total supply is not equal to total demand
For this sample business problem, you can create a dummy supply and
demand node which acts as a catch-all for the excess supply and demand
Other Problems:
Objective function is maximized rather than minimized (e.g., profit criterion)
Routes that have specified capacity restrictions or minimums
Some routes may be unacceptable
Sample List of Algorithms for Transportation Problem
Northwest Corner rule
Minimum Cost Method
Vogels Approximation Method
Stepping Stone Method
Modified Distribution Method
Multi-Commodity Flow Problem
Overview of the Multi-Commodity Flow Problem
The Multi-Commodity Flow Problem is a Network Flow Problem that has multiple
commodities flowing through a network, where each commodity has different
supply and demand nodes and each arc route has capacity restrictions. In finite
time, only approximate algorithms can be used.
Decision: How much quantity of each commodity should be sent through each
arc, given supply-demand and capacity constraints?
Objective: Flow assignment that satisfies the constraints.
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
41

For each arc:
x = Cost of transportation per unit (varies for each commodity)
y = Lower capacity constraint
z = Upper capacity constraint

For each supply node:
a = Available supply of commodity A
b = Available supply of commodity B
c = Available supply of commodity C

For each demand node:
d = Demand for commodity A
e = Demand for commodity B
f = Demand for commodity C

Linear Programming Method Standard Form
K = Index for number of commodities, K = 1, 2, 3k
ckij = Cost per unit of commodity k along arc (i,j)
uij = Capacity on arc (i,j)
ski = Available supply of commodity k at node i
dkj = Required quantity (demand) of commodity k at node j
xkij = Flow of commodity k along arc (i,j), where xij is defined only for those arcs
that exist in the network

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
42

Variations to Multi-Commodity Flow Problem
Minimum Cost Multi-Commodity Flow Problem: This problem is applied where
there is a cost associated with sending flow on each arc that needs to be
minimized
Maximum Multi-Commodity Flow Problem: This problem is applied where there
are no hard demands on each commodity, but the total throughput has to be
maximized
Maximum concurrent flow problem: This problem is applied where the task is to
maximize the minimal fraction of the flow of each commodity to its demand
Sample List of Algorithms for Multi-Commodity Flow Problem
Dantzig-Wolfe Decomposition
Frank-Wolfe Algorithm
Lagrangian Relaxation
Augmented Lagrangian Relaxation
Proximal Decomposition
Which Problem to Choose?
Common Limitations of Network Flow Problems
Most business problems may not perfectly fit into the format of a particular
Network Flow Problem. These problems can be used as a basis for
conceptualizing other heuristics
Most special purpose algorithms can be used to solve only single objectives. It
may be necessary to use Linear Programming or other heuristics if there are
additional constraints or objectives
When arc values in a particular network are negative, customized algorithms
need to be used
Depending on the nature of the business problem, objectives may need to be
maximized or minimized. For example, given that the Shortest Path Algorithm
always identifies a minimum value solution, it may not be ideal to apply the
algorithm to situations that involve a profit criterion
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
43

Dynamic Programming
Overview of the Dynamic Programming
Dynamic Programming is a unique problem solving approach that decomposes a
large, complex problem into multiple smaller problems that are easier to solve.
The Dynamic Programming approach results in the optimal solution for the large
problem once all the smaller problems have been solved.
Linear Programming Method Standard Form
xn = State variables, which represent input to stage n (output from stage n + 1)
dn = Decision variable at stage n
tn = Stage transformation function that determines the stage n output
rn = Return function for stage n, which represents the payoff or value for a stage
N = Number of stages in the dynamic program. N varies from 1 to N.
The general expression for the stage transformation function is xn-1 = tn (xn, dn)
The general expression for the return function is rn (xn, dn)

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
44

Section 6 Applied
Statistics (Part II)
Simple Regression and Correlation
Overview of Simple Regressions and Correlations
Regressions and correlations deal with the determination of relationships
between variables. Both regression and correlation analyses help to determine
the nature and strength of a relationship between variables.
Regression analysis is used to develop an estimating equation, which is a
mathematical description of the relationship between a known variable and an
unknown variable.
Correlation analysis is used to determine the degree to which the variables are
related. In essence, correlation analysis is used to decide how well the estimating
equation actually describes the relationship.
Causality between Variables
There is usually a causal relationship between the dependent and independent
variables. For example, as the relationship between advertising spends and
sales an increase in advertising spends causes an increase in sales.
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
45


Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
46

Scatter Diagram
A Scatter Diagram is a diagram in which the data is plotted on a chart.
Some of the uses of Scatter Diagram are:
Helps visually identify if there are any patterns to indicate that the variables are
related
Identifies the kind of line and required estimation equation that describes the
relationship
Different types of scatter diagrams are:

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
47

Equation for a Straight Line



Equation for a Straight Line
To fit a regression line mathematically, it is necessary to fit a line such that it
minimizes the total square error between the estimated points on the line and
actual observed points that were used to draw it.
Squaring the errors magnifies (or penalizes) larger errors, and cancels the effect
of positive and negative values.

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
48

Formulas for the Method of Least Squares




The slope of a line (b) obtained using linear least squares fitting is called the
Regression Coefficient.
Estimating the Regression Equation
Several statistical packages are readily available that estimate the regression
equation and provide the coefficients.
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
49

Example: Microsoft Excel

For any set of values for X and Y, Excel can be used to rapidly plot the linear
trend line and derive the regression equation.
Standard Error of Estimate
The measure of reliability is called the Standard Error of Estimate.
Standard Error of Estimate is denoted by se. It measures the variability, or
scatter, of the observed values around the regression line. Statistical packages
calculate the se and provide the value as the output.
Formula for Standard Error

Interpretation
The se can be used to form bounds around the regression line as follows:
68% of the points can be found within a band of +/- 1 se around the regression
line
95.5% of the points can be found within a band of +/- 2 se around the
regression line
99.7% of the points can be found within a band of +/- 3 se around the
regression line
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
50

Correlation Analysis
Correlation Analysis is a statistical tool that is used to describe the degree to
which one variable is linearly related to another. It is used in conjunction with
regression analysis to measure how well the regression line explains the
variation of the dependent variable.

The sign of r indicates the direction of the relationship between the two variables.
r2 = 1 and r = 1, means that the two variables are perfectly correlated and the
slope of the line is positive
r2 = 0 and r = 0, means that the two variables are not at all correlated
r2 = 1 and r = -1, means that the two variables are perfectly negatively
correlated and the slope of the line is negative
For example, if r2 = 0.45, it means that only 45% of the total variation in the
dependent variable is explained by the regression line. It is important to note that
r2 measures only the strength of a linear relationship between two variables.

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
51

Multiple Regression and Correlation
Overview of Multiple Regression
More than one variable is used to estimate the dependent variable to increase
the accuracy of the estimate. For example, there is a positive relationship
between demand for sunglasses and various demographic characteristics (age,
income) of the buyers that is, demand varies directly with changes in their
characteristics.
This process is called multiple regression and correlation, and is based on the
same assumptions and processes we discussed in simple regression.
Example
Sale of Beer = 0 + 1*(Temperature) + 2(NASDAQ Levels) + 3(Price of Beer)
+ 4 + 5 + ..
Three Step Process Multiple Regression and Correlation
Analysis
Step 1: Describe the Multiple Regression Equation
Step 2: Examine the Multiple Regression Standard Error of Estimate
Step 3: Use Multiple Correlation Analysis to determine how well the regression
equation describes the observed data and refine the model by adding or
changing the terms as necessary
Assumptions
Some of the key assumptions in Multiple Regression Analysis are :
Normality
Linearity
Reliability
Homoscedasticity

Standard Estimating Equation for Multiple Regression
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
52


The multiple regression equation contains several types of terms that are
introduced based on the situation. Some of the types are:
Linear Terms: Terms that affect the independent variable linearly X1, X2
Non-Linear Terms: Terms that affect the dependent variable non-linearly X32
Dummy Variables: Terms that represent qualitative factors like gender and can
have discrete values or levels
Interaction Variables: Terms that represent combined effect of the two
independent variables on the dependent - X1X2
Sample Multiple Regression Equation

Dummy / Binary Variables
Dummy, or Binary variable regression models involve usage of categorical (non-
quantitative) variables with two or more levels. The number of dummy variables
used is one less than the number of levels of the categorical variable.
Examples
Gender is a categorical variable with two levels that can be coded as 0 and 1
States in the U.S. is a categorical variable with 50 possible levels
Interaction Variables
An interaction variable is a variable often used in regression analysis, formed by
the multiplication of two independent variables. An interaction regression model
is used when response to one independent variable varies at different levels than
those of another independent variable.
Multiple Regression model equation with interaction term:

where 3, X1, X2 are the interaction terms
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
53
Standard Error of Estimate for Multiple Regression


Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
54

Overview and Effect of Multicollinearity (Model Issue)
Multicollinearity is a statistical phenomenon in which two or more predictor
variables in a Multiple Regression model are highly correlated thereby violating
the linearity assumption required
While conducting Multiple Regression analysis, the regression coefficients
become less reliable as the degree of correlation between the independent
variables increases
In contrast to simple regression where each variable is highly significant, in
Multiple Regression, the variables are collectively very significant but
individually not significant
Although it may still be possible to make estimations when Multicollinearity is
present, results may change erratically in response to small changes in the
model or the data
This is particularly important as it is possible to accurately predict how the
dependent variable will change as you tweak any of the independent variables
that are correlated with another independent variable
Indicators of Multicollinearity (Model Issue)
Large changes in regression coefficients when an independent variable or
additional observations are added
The model as a whole does a good job explaining the data, but none/few of the
coefficients are statistically significant by themselves
Variance Inflation Factor (VIF) of > 5; where VIF = 1/ (1-R2)
Common Remedies for Multicollinearity (Model Issue)
Drop one of the variables that is causing the Multicollinearity at the risk of
imminent bias in the remaining variables
Obtain more data
Overview and Effects (Heteroscedasticity & Error Trends)
Ideally, residuals or error terms are randomly scattered around 0 (the horizontal
line), providing a relatively even distribution. Heteroscedasticity is indicated
when the residuals are not evenly scattered around the line. Example: The
error term could vary or increase with each observation, something that is often
the case with cross-sectional or time series measurements
Heteroscedasticity does not mean your coefficients are wrong, but rather that
the model becomes less accurate as you increase term values
Heteroscedasticity often occurs when there is a large difference among the
sizes of the observations
Seeing other trends (i.e. nonlinear relationship) in the model will clue you in to
missing model terms
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
55

Detection and Remedy (Heteroscedasticity & Error Trends)

Residual plots (plot of error terms) in Multiple Regression Analysis allows visual
detection of heteroscedasticity
Dealing with heteroscedasticity is reasonably straightforward but a little
technical. Techniques are widely available and can be found through
textbooks, SMEs, etc.
For dealing with other error trends, you need to add additional terms to your
model. For example, if you see a parabola in the error terms, you should try
adding an x2 term
Exercise: Using what youve just learned, interpret the output for the following
problem:
Question 6.1: For this problem, well return to the case of Moondrop Airline
Corporation (MAC) from the Network Problems course. MAC has expanded its
operations to cover 15 terminals and has recently conducted a survey across
these terminals for the month of February. The information collected covers
sales, spend on promotions, number of competing airlines at that terminal and
the number of passengers who flew for free.
Solution
Step 1: Input
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
56

Step 2: Output


Reflective Question: To arrive at the multiple regression equation, how are the
coefficients interpreted?









Exercise Reflection: Use the space below to note the important things you have
learned about solving problems using the Multiple Regression and Correlation
Analysis.
________________________________________________________________
________________________________________________________________
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
57

Factor Analysis, Clustering & Discriminant Analysis
Overview of Factor Analysis
Factor analysis (also called PCA Principal Component Analysis) is a statistical
method used for data reduction and summarization. Observed variables are
represented in terms of variables which are unobserved (factors). It investigates
whether a number of variables of interest are linearly related to a small number of
unobserved factors.
Benefits of Factor Analysis
The primary benefit of Factor Analysis is that a large number of correlated
variables can be reduced to a manageable level:
Fewer number of factors results in ease of interpretation and reduced
complexity
Effects of Multicollinearity are eliminated as the factors are orthogonal to each
other
Commercially available statistics packages can be used to conduct this analysis.
Excel doesnt have the capability to conduct Factor Analysis.
Factor Tables
The parameters (coefficients) of linear function between unobserved variables
and unobserved factors are provided in the output table:
Variables
Factors
Luxury Factor 2 Factor 3
Prestige 0.7655 0.1242 0.3343
Strong Brand 0.9876 0.3423 0.5684
Variable 3 0.4566 0.4533 0.8977
Variable 4 0.3424 0.9856 0.3455
0.4666 0.6753 0.3453
World-class Service 0.7643 0.2342 0.5564
Value for Money 0.1226 0.4674 0.7896
0.6773 0.3433 0.8996
0.3453 0.8772 0.3453

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
58

Variables
Each variable is weighted proportionately to its involvement in the factor. The
more involved a variable, the higher the score (positive or negative depending on
the direction of relation). Scores on multiple variables of each sample can be
converted to a limited number of factors using a linear equation derived from a
factor loading table:


Factors
The factor scores are unobserved and abstract; therefore, its direct interpretation
is not available.
Once we have factor scores, we can use them as independent variables in
regression as follows:


Types of Factor Analysis
Exploratory Factor Analysis
Confirmatory Factor Analysis

Applications of Factor Analysis
Some of the business situations in which Factor Analysis is used are:
Behavioral sciences and psychometrics
Social sciences
Marketing
Product management
Operations research
Other applied sciences that deal with large quantities of data

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
59

Overview of Clustering

Clustering is used to identify the intrinsic grouping in a set of objects and classify
them into relatively homogenous groups (called clusters) so that objects from the
same cluster are more similar to each other than objects from different clusters.
Cluster Dendogram
A dendogram is a graphical representation of a hierarchy of nested cluster
solutions starting from a one-cluster solution all the way through to an n-cluster
solution.
Drawing a perpendicular line through the dendogram corresponding to a
particular distance shows the cluster solution at that level of distance.
Method of Clustering
Hierarchical Methods
Partitioning Methods
Applications of Clustering
Market segmentation
Market structure analysis
Petroleum geology
Data mining
Pattern recognition
Image analysis
Biology and numerical taxonomy
Overview of Discriminant Analysis
The objective of Discriminant Analysis is to classify objects (people, items, etc.)
into two or more groups based on the features of the objects.
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
60
Approaches to Discriminant Analysis
Discriminant analysis is an analysis of dependence method where the dependent
variables are categorical in nature, dividing the set of observations into mutually
exclusive and collectively exhaustive groups.
A categorical variable classifies objects into categories (e.g., good/bad,
high/medium/low, etc.). Typically, G 1 variables (each a binary indicator)
describe membership in G mutually exclusive and collectively exhaustive groups.
The output of discriminant analysis is an equation (similar to the regression
equation) involving independent variables which calculate the discriminant score,
and also a cut-off score to identify membership of each of the items into groups.
Commercially available statistics packages can be used to conduct this analysis.
Discriminant Analysis Tool Output Standard Form
The score for each object can be calculated for which we want to predict the
group membership using canonical discriminant function. The decision to which
group the object belongs is made by comparing the score with a calculated cut-
off score.
Canonical Discriminant Function Coefficients:



Functions at Group Centroids:

For each object, the discriminant score can be calculated using the equation.
This score can be compared to the cut-off score to determine into which group
the item can be classified.
Common Methods of Discriminant Analysis
Fishers Approach
Mahalanobis' Approach

Applications of Discriminant Analysis
Product management
Marketing research
Bankruptcy prediction
Credit scoring
Face recognition
Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
61

Solutions
Solution 1.1:
Let the diet contain x units of A and y units of B. Total cost = 2x + 4y
Objective Function: Minimize Z = 2x + 4y
Constraints:
10x + 25y 200
20x + 10y 100
15x + 20y 150
x 0, y 0
Solution 1.2:
Step 1: Since x>0, y>0, we consider only the first quadrant of the xy plane
Step 2: We draw straight lines for the equation
2x+ y = 100
x + y = 80
To determine two points on the straight line 2x + y = 100
Put y = 0, 2x = 100
x = 50
(50, 0) is a point on the line 2
put x = 0 in (2), y =100
(0, 100) is the other point on the line 2
Plotting these two points on the graph paper draw the line which represent the
line 2x + y =100.

Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
62


This line divides the 1
st
quadrant into two regions, say R
1
and R
2
. Choose a point
say (1, 0) in R
1
. (1, 0) satisfy the inequation 2x + y 100. Therefore R
1
is the
required region for the constraint 2x + y 100.

Similarly draw the straight line x + y = 80 by joining the point (0, 80) and (80, 0).
Find the required region say R
1
', for the constraint x + y 80.

The intersection of both the region R
1
and R
1
' is the feasible solution of the Linear
Programming problem. Therefore every point in the shaded region OABC is a
feasible solution, since this point satisfies all the constraints including the non-
negative constraints.







Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
63



Solution 2.1:


Solution 2.2:


Deloitte Advanced Analytics Group
Supply Chain Analytics Unit 1 Workbook
64

Solution 2.3:


Solution 3.1:
If the possible outcomes for an experiment are a
1
, a
2
, . . .,a
n
, and if the
probabilities of these outcomes are p
1
, p
2
, . . ., p
n
then the expected value is
E = a
1
p
1
+ a
2
p
2
+ . . . a
n
p
n

Expected Value E = 0(0.50) + 1(0.40) + 2(0.10) = 0.6

You might also like