You are on page 1of 598

Student Study Guide and Notes

BAS 120: Business Analytics I


1 Version 1.0 Fall 2013
Table of Contents
P1: Program Overview Certificates
P2: Course Logistics
P3: Midterm and Final Projects
C 1: Introduction to Business Analytics
C 2: Analysis on Spreadsheets
C3 : Data Visualization
C4: Descriptive Statistics
C5: Distributions and Data Models
C6: Sampling & Estimations
C7: Statistical Inference
A1: Boot Camp Material
A2: Sample Final Exam Questions by Chapter

2 Version 1.0 Fall 2013
Details: http://www.waketech.edu/programs-
courses/credit/business-analytics
Preface 1: Business Analytics
Program Overview
3 Version 1.0 Fall 2013
4
Wake Tech Business Web Site
Details: http://www.waketech.edu/programs-
courses/credit/business-analytics
Version 1.0 Fall 2013
What makes a Good Analyst ?
5
Skeptic
Version 1.0 Fall 2013

Our Tier II Certificate.
4 Business Analytic Courses.
Aligned : SAS Statistical Business Analyst.


Our Tier I Certificate.
4 Business Analytic Courses.
Aligned : SAS Base Programmer.

Business Analyst
Certificate
Business Intelligence
Certificate
6
First Community College.
8 Business Analytic Courses.
Total: 65 Hours.
Associates Degree in BA
Wake Tech Business Analytics Overview
Version 1.0 Fall 2013
Upon completion, students may sit for the BASE SAS Programmer Industry Certification.
BAS 120
Business
Analytics I

Intro to Business
Intelligence (C1-7)





Software: Excel

~ 90 Solved
Examples


BAS 121
Bus Analytics
Methods I

Intro to Predictive
Analytics (C8-12,18)





Excel Plus*

~ 75 Solved
Examples


BAS 220
Business
Analytics II

Intro to SAS Base
Essentials





SAS 9.3

~ 25 Solved
Examples


BAS 150
Business
Analytics Tools I

Intro to SAS Data
Manipulation





SAS 9.3

~ 25 Solved
Examples


* Utilized Frontline Analytic Solver Platform for Education, #1 Worldwide for MBA programs.
Articles
Cases
SAS
Materials
Articles
Cases
SAS
Materials
7
Business Intelligence Certificate Courses
Version 1.0 Fall 2013
Upon completion, students may sit for the SAS Statistical Business Analyst Certification.
BAS 221
Bus Analytics
Methods II

Intro Prescriptive
Analytics (C13-17)





Excel Plus*

~ 60 Solved
Examples


BAS 230
Business
Analytics III

SAS: Statistics I,
ANOVA, Regress.





SAS 9.3

~ 25 Solved
Examples


BAS 250
Bus Analytics
Tools II

SAS: Predictive
Modeling





SAS 9.3

~ 25 Solved
Examples


BAS 270
Bus Analytics
Practicum

Student Project
and Presentation





Any

~Designed Project


Program
Project
Articles
SAS
Materials
Articles
SAS
Materials
* Utilized Frontline Analytic Solver Platform for Education, #1 Worldwide for MBA programs.
8
Business Analyst Certificate Courses
Version 1.0 Fall 2013
Venues
Instructor Lead
E-Learning and Accelerated E-Learning
9
Program Offers
Resources
E-Lectures , E-Solved Videos Series
Installed Computer Software (SAS, Frontline)
Installed Data Sets (Over 50 data sets)
Instructor Aid and Advising.
E-Supplement, Boot Camps, ILC
Opportunities
Internal Competitive Projects (PBL External )
Industry Certification Exams
Version 1.0 Fall 2013
Certification Support (Independent from Course)

SAS certification exam: (Not part of course)
Sample exams and question sets for prep use.
50% discount on exam cost for students.
$90 savings.
Plan : Exam to be given at Wake Tech campus.
Time and effort savings.
10
Potential Certification and Recognitions
Competitions / Student Recognition

BAS 120: Outstanding Data Visualization Project/Video.
BAS 121: Outstanding Regression Model Project/Video.
BAS 220: Outstanding SAS Basic Programming Exam Prep Score.
PBL : Spring 2013: Student won NC Statistical Analysis.

Version 1.0 Fall 2013
E-Lecture and E-Solved
Version 1.0 Fall 2013 11
Details: BAS 120 Blackboard site at
www.waktech.edu
Preface 2: BAS 120 Logistics
12 Version 1.0 Fall 2013

Agenda:
Course Introduction.
Tour of Blackboard and
YouTube Video Site.


Mr. Rankin terankin1@waketech.edu
Welcome to Business Analytics.

13 Version 1.0 Fall 2013
Introduction
Tsunami of Data. Doubles each year.
90% of worlds data created in last 3-4 years.
Facilitated by more storage.
Facilitated by more tools to analyze.
Complex: because its not just more but its
different. New sources of data. Unstructured.
Complex: shortage of talent.

Business Analytics: use of data to improve
business decisions.

14 Version 1.0 Fall 2013
Facilitators
15 Version 1.0 Fall 2013

16 Version 1.0 Fall 2013
17 Version 1.0 Fall 2013
A few Examples
Marketing



Fraud



Entertainment

The list goes on: Health Care, Finance, Operation.
18 Version 1.0 Fall 2013
Brand Sentiment
Higher NPS
:-)
Predictive Maintenance
Less Downtime
Network Optimization
Lower Cost
Propensity to Churn
Greater Retention
Real-time Demand/
Supply Forecast
More Efficient
360
O
Customer View
Loyal Customers
WHAT IF YOU COULD TURN NEW SIGNALS
INTO BUSINESS VALUE?
Asset Tracking
Increase Productivity
Personalized Care
Loyal Customers
Product Recommendation
More Sales
Risk Mitigation, Real-time
Retain Market Value
Insider Threats
Greater Security
Fraud Detection
Lower Risk
Current State of BA
Still in emerging stage.
97% of companies (over $100M) use BA.
Mainly reporting and spreadsheets.
Intuition still drives most business decisions.
Cautious Adaptation.
Companies desire: bottom line impact.
Roadblocks to adaptation
DATA (accuracy, consistency, access, volume)
Lack of analytic talent.
Role of culture.
20 Version 1.0 Fall 2013
Mr. Rankin terankin1@waketech.edu
MBA UNC-CH.
30 years in
Business.
2 Years
Instructing.
Married 25 years
with 2 children.
21 Version 1.0 Fall 2013
Mr. Rankin terankin1@waketech.edu
Textbook for Course
Title Business Analytics
Author James R. Evans
ISBN 978-0-13-295061-9
Publisher Prentice Hall PTR
Publication Date February 10, 2012
Binding Trade Paper
Price $180.00
22 Version 1.0 Fall 2013

Weekly Lessons or Modules
Due: Friday at Midnight (Soft)
Saturday at Midnight (Hard)
Normal Module :
Checklist / Reading / Video Lecture /
Study Material
Graded Quiz
Graded Homework
Graded Discussion
One Extra Credit Opportunity.
Mr. Rankin terankin1@waketech.edu
H
o
w

C
o
u
r
s
e

W
o
r
k
s
.

23 Version 1.0 Fall 2013

Purchase Text Book

Blackboard

Wake Tech Email

Complete M1.

Mr. Rankin terankin1@waketech.edu
W
e
e
k

1

C
h
e
c
k
l
i
s
t
.


24 Version 1.0 Fall 2013

40% Mid Term & Final Exam
30% Mid Term & Final Project
10% Case Discussions
10% Quiz (3 attempts)
10% Assignments
Extra Credit up to 5 points.

Wake Tech 10 Pts Scale.

Mr. Rankin terankin1@waketech.edu
H
o
w

G
r
a
d
e
s

W
o
r
k
.


The My
Grades Tab
on
Blackboard
will always
be current.
25 Version 1.0 Fall 2013


No Late Homework.

90% Attendance
Attendance taken every class.
No more than 2 missed
postings.


Mr. Rankin terankin1@waketech.edu
I
n
s
t
r
u
c
t
o
r

P
o
l
i
c
i
e
s
.


26 Version 1.0 Fall 2013

terankin1@waketech.edu

Main Campus : Holding
Hall Room 108 H.

Office hour on Blackboard.


Mr. Rankin terankin1@waketech.edu
C
o
n
t
r
a
c
t

I
n
f
o
r
m
a
t
i
o
n
.


27 Version 1.0 Fall 2013
Discussion 1: Introduction
A short ice breaker.
In Class Tour of Blackboard or for
On-line students video tour
29 Version 1.0 Fall 2013
Details: BAS 120 Blackboard site at
www.waktech.edu
P3: Mid Term and Final Project
30 Version 1.0 Fall 2013
31
Fictional Company
Performance Lawn Care
Data Sets
2 Projects
30% of Final Grade
1. Data Visualization 2. Data Sampling
Version 1.0 Fall 2013
Mid Term Project: 15% of Final Grade Due Oct 8
th
EOD
- Write a formal report summarizing your results for all three parts of
this case. (Pg 1 of 2)
Performance Lawn Care (PLC) Data File: PLC
1. PLC originally produced lawn movers, however a significant portion of sales over
recent years has come from the growing small tractor market. PLC sells products
worldwide. Three years ago a new region was opened to serve China, where a
booming market for small tractors has been established. PLC has always put an
emphasis on both quality and also easy of product use. Before making any
decisions Ms. Burke (Executive VP of Operations) is asking you to construct an
appropriate set of worksheets and/or charts to summarize and present your
conclusions from the following set of data:
Dealer Satisfaction
End-User Satisfaction
Complains
Mower Unit Sales
Tractor Unit Sales
On-Time Deliveries
Defects after deliveries
Response Time

32 Version 1.0 Fall 2013
Mid Term Project: 15% of Final Grade Due Oct 8
th
EOD
- Write a formal report summarizing your results for all three parts of
this case. (Pg 2 of 2).
Performance Lawn Care (PLC) Data File: PLC
2. The supply chain worksheets provide cost data associated with logistics between
existing plants and customers as well as proposed new plants. Ms. Burke wants
you to extract the records associated with the unit shipping costs of proposed
plant locations and compare the costs of existing locations against those of the
proposed locations using quartiles, and provide recommendation.
3. Ms Burke would also like a quantitative summary of the average response for
each of the customer attributes in the worksheet 2012 Customer Survey for each
market region as a cross tabulation (use Pivot Tables), along with frequency
distributions, histograms, and quartiles of these data.
33 Version 1.0 Fall 2013
Version 1.0 Fall 2013 34
Data Visualization Contest : Extra Credit
First place will be awarded to a student with
the best Data Visualization Project. In order
to be eligible the student will have to in
addition to the written report create a no
longer than 5 minute camtasia (free download
version available) for judging.

All students who complete both a quality
written assignment and a video presentation
will be eligible for up to 5 point of extra credit
added to you final grade in the course.

Videos and reports will be judged by a panel.

Videos and Reports are due EOD October 8
th
.

The winner will be announced on or before
ECO Oct 15
th
.
Final Project: 15% of Final Grade Due Nov 22
nd
EOD
- Write a formal report summarizing your results for all five parts of
this case. (Pg 1 of 2)
Performance Lawn Care (PLC) Data
File: PLC
1. What proportion of customers rate the company with
top box survey responses (which is defined as scale
levels 4 or 5) on quality, ease of use, price and service in
the 2012 Customer Survey worksheet? How do these
proportions differ by region?
2. What estimates, with reasonable assurance, can PLC give
customers for response times to customer service calls?
Explain what you mean by reasonable assurance and how
the data supports your answer.
3. Engineering has collected data on alternative process
costs for building transmissions in the worksheet
Transmission Costs. Can you determine whether one of
the proposed processes is better than the current
process. Explain your answer and support with data.
35 Version 1.0 Fall 2013
Final Project: 15% of Final Grade Due Nov 22
nd
EOD
- Write a formal report summarizing your results for all five parts of
this case. Pg 2 of 2.
Performance Lawn Care (PLC) Data File: PLC
4. What would be a confidence interval for an additional sample of mower test
performance as in the worksheet Mower Test?
5. How many blade weights must be measured to find a 95% confidence interval
for the mean blade weight with a sampling error of at most 0.2 ? What if the
sampling error is specified as 0.1 ?
36 Version 1.0 Fall 2013
Details: BAS 120 Blackboard site at
www.waktech.edu
BAS 120 Chapter 1: Introduction
to Business Analytics
37 Version 1.0 Fall 2013
What is Business Analytics ?
Course: BAS 120
Chapter 1 (Part 1 of 2)
Topics
What is Business Analytics?
Evolution.
Importance.
Application.
Scope.

38 Version 1.0 Fall 2013
Using Data to Gain Business Insights.
D
e
f
i
n
e

Business Analytics: the use of data,
information technology, statistical
analysis, quantitative methods and
mathematical or computer-based
models to aid business managers gain
insights about their business
operations.
What is Business Analytics ?

39 Version 1.0 Fall 2013
Example 0.1 Question ?
Who are Netflixs most profitable customer?

Frequent Movie
Viewer

In-Frequent Movie
Viewer
40 Version 1.0 Fall 2013
Question ?
Who do you provide the best customer service
to?

Your most
profitable
customers?

Or your less
profitable
customers?
41 Version 1.0 Fall 2013
Question ?
So who do you think Netflixs gives priority
shipping to?

Frequent Movie
Viewer

In-Frequent Movie
Viewer
42 Version 1.0 Fall 2013
Netflix
Revenue: 1999 $5M 2006: > $1B
Analytics: Movie recommendation engine.
$1 M price contests.
Analytics to measure Customer Behavior.
throttling: infrequent-use customers (most
profitable) given priority in shipping
Analytics to measure Buying Patterns
Paying for distribution rights of DVDs (look at
success of related movies)
43 Version 1.0 Fall 2013
Question ?
What is a major challenge for Netflixs in the
future ? What analytic information could help
Netflix ?

44 Version 1.0 Fall 2013
What is Business Analytics ?
Course: BAS 120
Chapter 1
Topics
What is Business Analytics?
Evolution.
Importance.
Application.
Scope.

45 Version 1.0 Fall 2013
Evolution : Printing Press to Phone

Year : 1440.
Major Change in the World: ~ 400 years apart.
Year : 1870.
46 Version 1.0 Fall 2013
Evolution : The Computer

Year : 1940-50.
First opportunity to handle large amounts of data.
Gave growth to:
Analysis by various names:
Business Intelligence.
Operations Research.
Management Science.
Decision Support Systems.
47 Version 1.0 Fall 2013
Evolution: The Personal Computer

Year : 1974.
Further growth or evolution.
Gave growth to:
Widespread Access:
Limited Speed.
Available Data.

48 Version 1.0 Fall 2013
Evolution: The Web

Year : 1992
Things started exploding.
Gave growth to:
Ability to:
Share.
Compare.
Access.
Widespread growth.
49 Version 1.0 Fall 2013
Evolution: Example

Microsoft
Rapid Change
Wikipedia
50 Version 1.0 Fall 2013
Evolution: Where are we now ?

Analytics still in emerging stage.
Business Analytics: Still in emerging stage.
Source: SAS report on Current state of Business Analytics
97% of companies (over $100M) use BA.
Mainly reporting and spreadsheets.
Intuition still drives most business decisions.
Cautious Adaptation.
Companies desire: bottom line impact.
Roadblocks to adaptation
DATA (accuracy, consistency, access, volume)
Lack of analytic talent.
Role of culture.
51 Version 1.0 Fall 2013
What Companies Are Telling Us About
Analytics
We are helping clients harness their vast amounts of
customer and operational data
We cannot find enough new grads with the right quantitative
skills
We compete on the basis of better knowledge of their
customers, using analytics
The riskier our business problems the more we rely on
analytics
After implementing our ERP system we are mining that data,
and using data better in different ways

52 Version 1.0 Fall 2013
What Business Leaders Are Saying
In God we trust, all others bring data
Do you think that, or do you know that?
Those who succeed with six sigma, and then
advance in our company, have the better
quantitative skills
We are basing our strategy on analytics,
especially customer analytics

53 Version 1.0 Fall 2013
What is Business Analytics ?
Course: BAS 120
Chapter 1
Topics
What is Business Analytics?
Evolution.
Importance.
Application.
Scope.
54 Version 1.0 Fall 2013
Better understanding : Better decisions.
I
m
p
o
r
t
a
n
c
e


Strong relationship between business analytics
and business performance.

Understanding data supports better business
decisions.

Fast pace of change: information becoming
more vital for businesses to remain
competitive.

Importance: Aids Business Performance

55 Version 1.0 Fall 2013
Importance: Tsunami of Data

Tsunami of Data.
Every two days
now we create as
much information
as we did from the
dawn of civilization
up until 2003,
according to
Schmidt.
Eric Schmidt, CEO of
Google


56 Version 1.0 Fall 2013
Importance: Data Source is Different

The sources of data are changing.
57 Version 1.0 Fall 2013
New kinds of Data.
# Tweets >
Planets
Population :
The number of tweet per
month now exceed the
population of the planet.

Source: SAP report

Importance: Data Source is Different

FYI: For every tweet there are
~1,000 times emails sent.
58 Version 1.0 Fall 2013
Technologies advancements.
Importance: Facilitators

59 Version 1.0 Fall 2013
Numerous players and niche markets.
Importance: Large Landscape

60 Version 1.0 Fall 2013
High on Executive List.
83% : % of
Executive that agree
that the importance of
using information
effectively to run their
business has never
been greater.
Source: Business Week
Research Services
Importance: Executives

61 Version 1.0 Fall 2013
High Rate of Return.
$10.66 : "In
a recent report,
Nucleus Research
found that for every
dollar a company
spends on analytics,
it gets back $10.66."

Source: Nucleus Research

Importance: Profitable

62 Version 1.0 Fall 2013
Discussion 2: How Big is Big Data ?
What is Business Analytics ?
Course: BAS 120
Chapter 1
Topics
What is Business Analytics?
Evolution.
Importance.
Application.
Scope.
64 Version 1.0 Fall 2013
Applications in all areas of business.
Application : Overview

Source: Thomas Davenport: Competing on Analytics : Harvard Business Review: January 2006
Function Description Exemplars
Supply
Chain
Simulate and optimize supply chain flow; reduce
inventory and stock-outs.
Dell, Wal-Mart,
Amazon.
Customer
Service
Indentify customers with greatest profit potential;
increase likelihood that they will want the product or
service offering; retain their loyalty.
Harrahs, Capital
One, Barclays.
Pricing Identify the price that will maximize yield or profits Progressive,
Marriot.
Product /
Service
Detect quality problems early and minimize them. Honda, Intel.
Financial Better understand the drivers of financial
performance and the effect of nonfinancial factors.
MCI, Verizon.
65 Version 1.0 Fall 2013
High Rate of Return.
14 : the average
number of reward
tags a person in
the US has.
Understand
customer patterns
to better serve
customers.
Application : Customer Relations

66 Version 1.0 Fall 2013
High Rate of Return.
Suggest :
Most have
experienced being the
target of cross selling
on websites whether
is Amazon or Netflix.
Application : Marketing: Cross Selling

67 Version 1.0 Fall 2013
Large opportunity as small % improvements = big savings.
$43B : Value of
WalMart inventory. 1%
improvement = $430 M.
90% : Percent of
American who live with
15 minutes of a
WalMart. (US accounts
for about 60% of global
sales)
Application : Supply Chain

68 Version 1.0 Fall 2013
Dynamic Pricing, Web Content.
3,700 : properties
in 70 countries, Marriott
International offers global
travelers hospitality
choices across 18 brands.
~65% : occupancy
rate. Data Driven pricing
strategies can deliver
major results.


Application : Pricing

69 Version 1.0 Fall 2013
Real Time Customer Data to Drive Results.
Pain Threshold : Harrahs
knows your specific pain threshold
and at what point youll have lost
enough to quit.
Harrahs staff in the back room
tracks everything and when the
computer flags someone coming
close to their limit, a member of
Harrahs floor staff approaches the
soon-to-give-up gambler and
intervenes, offering a free steak
dinner, or another $15 credit or
even tickets to a show that evening.
The result: the gambler keeps
gambling.
Application : Entertainment

70 Version 1.0 Fall 2013
Real Time Customer Data to Drive Results.
Product Placement :
Milk ?
Health Food?
Pancake Syrup?
Picnic Items?
High Profit Margin Items?

Application : Product Placement

71 Version 1.0 Fall 2013
Real Time Customer Data to Drive Results.
Number of Employees,
Skills (Technical
Support), Hours of
Work :
Provide special level of service
for most profitable
customers?
What is an acceptable wait
time for customers?

Application : Staffing

72 Version 1.0 Fall 2013
Real Time Customer Data to Drive Results.
~ $190B
Merchants in the
United States are
losing a year to
credit card fraud
Source: according to a 2009
Lexis Nexis
Application : Financial: Fraud

73 Version 1.0 Fall 2013
Student Input
When to markdown
Seasonal Inventory?
What Factors to consider?
What Data would you
desire?
Example 1.1 Retail Markdown

74 Version 1.0 Fall 2013
Student Input
Tom lives in Raleigh
Tom often travels to
Toronto.
Tom often makes large
purchases on his credit
card.
What else would you
want to know about
Toms purchasing habits?
Example 1.1a Fraud Example

75 Version 1.0 Fall 2013
What is Business Analytics ?
Course: BAS 120
Chapter 1
Topics
What is Business Analytics?
Evolution.
Importance.
Application.
Scope.
76 Version 1.0 Fall 2013
Addressing three areas of Analytics.
Scope : Degree of Complexity
77 Version 1.0 Fall 2013
Descriptive (describes), Predictive (predicts), Prescriptive
(advises)
S
c
o
p
e

Descriptive analytics
Uses data to understand past and present.
Allow the data to describe the situation.
Example: The class average on the last
exam was a 81%.
Scope : Descriptive
78 Version 1.0 Fall 2013
Descriptive (describes), Predictive (predicts), Prescriptive
(advises)
S
c
o
p
e

Predictive analytics
Analysis of past performance.
Use past performance to forecast future
performance.
Example: Make sales calls on customers
who have recently purchased product.
Predictive analytics might indicated that
recent customer are more likely to make
additional purchases vs. customers who
have not visited your firm for a long time.


Scope : Predictive
79 Version 1.0 Fall 2013
Descriptive (describes), Predictive (predicts), Prescriptive
(advises)
S
c
o
p
e

Prescriptive analytics
Use of optimization techniques.
Analysis of what should be done vs. what is
happening.
Use data to arrive at a mathematical
optimal solution.
Example: What route and frequency a
bank should visit its ATM machines.
Scope : Prescriptive
80 Version 1.0 Fall 2013
Version 1.0 Fall 2013 81
Student Exercise 1.2
Data Set: None.

Problem:

Your Job: Suggest some metrics that a hotel might want to collect about their
guests. How might these metrics be used with business analytics to support
decisions at the hotel?
5 Learning Objectives in Lesson
Learning Objectives
1. Be able to define what business analytics is.
2. Understand the evolution of both US business
and business analytics.
3. Understand and explain the importance of
business analytics.
4. Understand and explain the various
applications of business analytics.
5. Understand the different scope of business
analytics.

82 Version 1.0 Fall 2013
Data For Business Analytics
Course: BAS 120
Chapter 1 (Part 2 of 2)
Topics
Types of Data
Decision Models
Problem Solving
Data Validation
Data Ethics
83 Version 1.0 Fall 2013
Helps us organize data.
D
a
t
a

&

D
a
t
a
b
a
s
e
s

Data or data set: only a collection of data.
Database: collection of related data.
Field or Attributes: Individual elements.
Entries or Records: Actual measures.
Databases provide method to organize.
Fields
Customer
Name
Customer
Number
Invoice
Number Amount Product Date
Entries or
Records Johnson 100 525 $65.00 Repair 1/15/2013
Data : Key Terms
84 Version 1.0 Fall 2013
How we measure performance.
M
e
t
r
i
c

Metric: a unit of measurement that provides a
way to objectively quantify performance.
Examples:
The score of a game.
A businesss earnings.
Quality: Defects per million.
Measurement: the act of obtaining data
associated with a metric.
Data : Key Terms
85 Version 1.0 Fall 2013
Data Classifications.
V
a
r
i
e
t
y

o
f

T
e
r
m
s


Data Types
DATA
Qualitative
Categorical Ordinal
Quantitative
Interval
Discrete Continuous
Ratio
Discrete Continuous
86
Often called
Numerical
Often called
Character
Version 1.0 Fall 2013
Using Data to Gain Insights.
Type Comment Example
Categorical Mutually Exclusive, but not
ordered, only put into to
categories
Eye Color, Geographical
Regions, Employee
Classifications
Ordinal Order or Ranked. Order
Matters but the difference
between each doesnt
matter
Hospital Pain Scale,
College Football
Rankings
Interval Constant difference but has
not zero.
Time and Temperature
Ratio Continuous and has a zero. Sales volumes.
Data Types
87 Version 1.0 Fall 2013
Student Input
Example 1.2 Sales Transaction Database

Which category are each?
Cust ID Region Payment Transaction Code Source Amount Product Time Of Day
10001 East Paypal 93816545 Web $20.19 DVD 22:19
10002 West Credit 74083490 Web $17.85 DVD 13:27
10003 North Credit 64942368 Web $23.98 DVD 14:27
10004 West Paypal 70560957 Email $23.51 Book 15:38
10005 South Credit 35208817 Web $15.33 Book 15:21
10006 West Paypal 20978903 Email $17.30 DVD 13:11
10007 East Credit 80103311 Web $177.72 Book 21:59
10008 West Credit 14132683 Web $21.76 Book 4:04
88 Version 1.0 Fall 2013
Example 1.2 A Sales Transaction Database File
Data for Business Analytics
1-89
Figure 1.1
Entities
Records
Fields or Attributes
Version 1.0 Fall 2013
Example 1.3
Classifying Data Elements in a Purchasing Database
Data for Business Analytics
1-90
Figure 1.2
Version 1.0 Fall 2013
Example 1.3 (continued)
Classifying Data Elements in a Purchasing Database
Data for Business Analytics
1-91
Figure 1.2
Version 1.0 Fall 2013
Data Types

Fields
Customer
Name
Customer
Number
Invoice
Number Amount Product Date
Entries or
Records Johnson 100 525 $65.00 Repair 1/15/2013
Data Type Categorical Ordinal Ordinal Ratio Categorical Interval
Data Types
92 Version 1.0 Fall 2013
Version 1.0 Fall 2013 93
Student Exercise 1.4
Data Set: Sales Transaction Database.

Problem: Classify each of the data elements as categorical, ordinal, interval or
ratio data.

Your Job: Ensure all data elements are properly classified.
Two types of metrics.
M
e
t
r
i
c

Discrete Metrics
Either (Y/N) so they can be counted.
Examples:
On time deliveries.
Pass inspection.
Order completeness.
Continuous Metrics
Based upon a continuous scale of measure.
Any metric with dollars, time, weight.
Data : Discrete vs. Continuous
94 Version 1.0 Fall 2013
Discussion 3: Competing on Analytics ?
Data For Business Analytics
Course: BAS 120
Chapter 1 (Part 2 of 2)
Topics
Types of Data
Decision Models
Problem Solving
Data Validation
Data Ethics
96 Version 1.0 Fall 2013
Decision Models facilitate making decisions.
D
e
c
i
s
i
o
n

M
o
d
e
l
s

Decision Model: is a logical or mathematical
representation of a problem or business situation.
Forms of a model.
A verbal description.
Graph or sketch.
Mathematical model or function.
Decision Models
Inputs Model Output
97 Version 1.0 Fall 2013
Decision Models facilitate making decisions.
F
a
m
i
l
y

B
u
d
g
e
t

E
x
a
m
p
l
e

Data Constants
Constants in model.
Example: Machine Capacity.
Family Size.
Uncontrolled Variables
Quantities changes and not
controlled by decision maker.
(often uncertain)
Example: Bonus.
Decision Variables
Quantities change however
are controlled by decision
maker.
Example: staffing levels.
Decision Models Inputs
98 Version 1.0 Fall 2013
Example 1.5 A Sales-Promotion Model
In the grocery industry, managers typically need
to know how best to use pricing, coupons and
advertising strategies to influence sales.
Using Business Analytics, a grocer can develop a
model that predicts sales using price, coupons
and advertising.
Decision Models
1-99 Version 1.0 Fall 2013
Decision Models
1-100
Sales = 500 0.05(price) + 30(coupons)
+0.08(advertising) + 0.25(price)(advertising)

Version 1.0 Fall 2013
Influence Diagrams can help us formulate Models.
I
n
f
l
u
e
n
c
e

D
i
a
g
r
a
m

Often in developing a decision model it is
helpful to produce an influence diagram.







TC = FC + (UVC * QP).


Total Costs
Total Fixed Costs
Total Variable
Costs
Unit Variable Costs Quantity Produced
101 Version 1.0 Fall 2013
Version 1.0 Fall 2013 102
Student Exercise 1.7
Data Set: None

Problem: A bank developed a model for predicting the average checking and
saving account balance as balance = -17,732 +367 x age + 1300 x years
education + 0.116 x household wealth.

Your Job:
Explain how to interpret the numbers in this model?
Suppose that a customer is 36 years old, is a college graduate (so year of
education is equal to 16) and has a household wealth of $175,000. What is
the predicted bank balance.
Uncertainty and Risk
U
n
c
e
r
t
a
i
n
t
y

&

R
i
s
k

Uncertainty: imperfect
knowledge of what will
happen. Ex: Will Plane
be on time.
Risk: Consequences of
what actually happens.
Ex: if I am scheduled to
be on the plane there is
risk, otherwise no risk to
me.


Uncertainty vs. Risk
103 Version 1.0 Fall 2013
Prescriptive Analytics; Model provides a optimal solution
Optimization: process.
Objective function:
normally a function which
minimizes or maximized a
variable (such as costs or
profits or travel distance).
Constraints: Avoid
highways.
Optimal Solution. The
solution : model provides.

Optimization
104 Version 1.0 Fall 2013
Deterministic or Stochastic
D
e
t
e
r
m
i
n
i
s
t
i
c


v
s
.

S
t
o
c
h
a
s
t
i
c

Model can be either
deterministic or stochastic.
Deterministic: input
information is known or
assumed to be known. (in
reality having all inputs
known is rare). Ex: Short Term
planet rotation around sun.
Stochastic: input information
is uncertain. Ex: Weather.
Inputs Known vs. Unknown
105 Version 1.0 Fall 2013
Version 1.0 Fall 2013 106
Student Exercise 1.13
Data Set: None

Problem: A manufacturer of mp3 players is preparing to set the price on a new
model. Demand is thought to depend on the price and is represented by the
model:
D = 2,000 3P
The accounting department estimates that the total costs can be represented
by
C= 5,000 + 4D

Your Job:
Develop a model in excel for the total profit.
Data For Business Analytics
Course: BAS 120
Chapter 1 (Part 2 of 2)
Topics
Types of Data
Decision Models
Problem Solving
Data Validation
Data Ethics
107 Version 1.0 Fall 2013
Problem Solving Steps
P
r
o
b
l
e
m

S
o
l
v
i
n
g

Problem solving steps
1. Recognition of a problem.
Gap between actual results and expect.
2. Defining the problem.
Involve all stakeholders.
3. Structuring the problem.
Goals, and constraints.
4. Analyzing the problem.
Key for analytics, data driven decisions.
5. Interpreting results and making decision.
Models are rarely perfect, human intelligence.
6. Implementing the solution.
Buy in, adequate resource, monitoring.
Steps in Problem Solving
108 Version 1.0 Fall 2013
Process will iterative
P
r
o
b
l
e
m

S
o
l
v
i
n
g

Normal Tasks for the Business Analyst
Identify and access the data sources.
Combine the data sources.
Transform the data and variables.
Clean data (missing, wrong category, etc)
Explore the data and describe data.
Visualize patterns of data with plots.
Analyze and model data.
Validate Models.
Implement and Update.
Large Problems : Team with Roles
109 Version 1.0 Fall 2013
Process will iterative
Iterative Process
110
Define business objective
Select Data
Explore Input Data
Prepare and Repair Data
Transform Input data
Assess Results
Deploy Models
Apply Analysis
Version 1.0 Fall 2013
Analytics in Practice
Developing Effective Analytical Tools
at Hewlett-Packard
Will analytics solve the problem?
Can they leverage an existing solution?
Is a decision model really needed?
Guidelines for successful implementation:
Use prototyping.
Build insight, not black boxes.
Remove unneeded complexity.
Partner with end users in discovery and design.
Develop an analytic champion.
Problem Solving and Decision Making
1-111 Version 1.0 Fall 2013
Data For Business Analytics
Course: BAS 120
Chapter 1 (Part 2 of 2)
Topics
Types of Data
Decision Models
Problem Solving
Data Validation
Data Ethics
112 Version 1.0 Fall 2013
Data Cleansing or Validation
Form
Mandatory field are entered, no missing data.
Field
Ensure Categorical field contain categorical data.
Range or Graphing
Checking for data that may be in error.
Tools available to aid in validation.
113 Version 1.0 Fall 2013
Common Data Problems
D
a
t
a

Data may be missing values.
Samples might not be representative.
Categorical data might have too many values.
Numerical data may have numerous outliers.
Meaning of data may change over time.
New employees in roles might categorized
differently.
Entry of data may be easy.
One business sold to another.
Data might be coded incorrectly.
Other common problems with data
114 Version 1.0 Fall 2013
Common Data Problems
D
a
t
a

Are dates a numerical value or a categorical ?

Are dates written the same around the globe ?

What about currency, do commas mean the same thing
around the globe?

What if we have a very long number but a column width
of only 8 , what does the computer do?
What should the computer do with missing values if we
are trying to sum a row or column?



Question (Other Problems)
115 Version 1.0 Fall 2013
Data Ethics
Falsification.
Bias the results.
Poor data gather.
Bias the results.
Poor data storage.
Compromising information.
Confidentiality.

Data is powerful be respectful.
116 Version 1.0 Fall 2013
117
30% of security violations come from inside the organization.
Source: 2009 Open Security Foundation report.

Cyber criminals are aggressively collaborating, selling each other
their wares, and developing expertise in specific tactics and
techniques.
Source: 2009 Cisco Mid-Year Security Report.

Only 28% of US small businesses have a formal Internet security
policy.
Source: 2009 National Small Business Study, National Cyber Security Alliance & Symantec.

85% of data breaches occur at the small business level.
Source: Visa

Data Ethics: Of Interest
Version 1.0 Fall 2013
Identify Types of Suspicious Behavior.
Identify in advance what constitutes suspicious behavior. This is
often referred to as the "red flags" of identity theft.

Develop Policies That Will Detect Suspicious Events Early and
Train Your Employees.
Put policies into place that will help you and your employees
identify a red flag and catch suspicious events early...or even as
they occur.

Respond to Suspicious Behavior.
Detecting red flags needs to be matched with potential action
plans. The type of action will depend on the type of red flag...and
the risk that red flag could lead to identity theft.

Write It Down. Type up the lists you just created, above:

118
Data Ethics: Suggestions from the BBB
Version 1.0 Fall 2013
Version 1.0 Fall 2013 119
Student Exercise 1.14
Data Set: None

Problem: The demand for airline travel is quite sensitive to price. Typically,
there is an inverse relationship between demand and price: when price
decreases, demand increases and vice versa. One major airline has found that
when the price (P) for round trip between Chicago and Los Angeles is $600, the
demand (D) is 500 passengers per day. When the price is reduced to $300,
demand is 1,200 passengers per day.

Your Job:
Plot these points on a coordinate systems and develop a linear model that
relates demand to price.
Develop a prescriptive model that will determine what price to charge in
order to maximize the total revenue.
By trial and error can you find the optimal solution that maximized total
revenue.
Algorithm
Business analytics
Business intelligence
Categorical (nominal)
data
Constraint
Continuous metric
Data set
Database
Decision model

Version 1.0 Fall 2013 1-120
Chapter 1 - Key Terms
Decision support
systems
Descriptive statistics
Deterministic model
Discrete metric
Entities
Fields (attributes)
Influence diagram
Interval data
Management science
(MS)

Measure
Measurement
Metric
Model
Objective function
Operations research
(OR)
Optimal solution
Optimization
Ordinal data


Version 1.0 Fall 2013 1-121
Chapter 1 - Key Terms (continued)
Predictive analytics
Prescriptive analytics
Problem solving
Ratio data
Risk
Search Algorithm
Stochastic model
Uncertainty
10 Learning Objectives in Lesson
Lesson : Learning Objectives
1. Be able to define what a metric is.
2. Describe the four groups of data types.
3. Explain the concept of a model.
4. Use influence diagrams to build simple
mathematical models.
5. Use predictive models to compute model
outcomes.
6. Explain difference between uncertainty and
risk.

122 Version 1.0 Fall 2013
10 Learning Objectives in Lesson
Lesson : Learning Objectives
7. Define the terms optimization, objective
function and optimal solution.
8. Explain the difference between a
deterministic and stochastic decision model.
9. List and explain the steps in the problem
solving process.
10. Understanding of data validation and data
ethics.

123 Version 1.0 Fall 2013
Chapter 1 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013 124
Chapter 1 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013 125
126
Chapter 1 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
Version 1.0 Fall 2013 127
All Chapter Exercise Solutions
Excel File with Solutions:

Instructor Lead Format: Access to the solutions is provided on Blackboard
for you to check during class.

E-Learning Format: Access to the solutions will be provided after the
exercise due date.

Details: BAS 120 Blackboard site at
www.waktech.edu
BAS 120 Chapter 2: Analytics on
Spreadsheets
128 Version 1.0 Fall 2013
Analysis on Spreadsheets?
Course: BAS 120
Chapter 2 Part 1 of 1
Topics:
Basic Excel Skills.
Excel Functions.
Spreadsheet Add-Ins
& Modeling.
129 Version 1.0 Fall 2013
If not already become familiar.
E
x
c
e
l

R
i
b
b
o
n

Basic Excel

Tabs, Groups, Buttons, Dialog Box Launcher.

130 Version 1.0 Fall 2013
If not familiar utilize ExcellFun Videos.
F
o
r
m
u
l
a
s

Basic Excel

Formulas Examples Note
Basic Math +, -, *, /, ^ Order of
Operations
Type = (Formula), (Text)
Cell
Reference
Relative, Absolute $, F4 Shortcut
Series of basic excel video , boot camp.

131 Version 1.0 Fall 2013
If not familiar utilize ExcellFun Videos.
Example 2.1 Demand Predictor

Excel File Student Action Answers
Demand
Predictor
Calculate demand using
the linear model

Calculate demand using
the non-linear model
Blackboard and
Videos
132 Version 1.0 Fall 2013
If not familiar utilize ExcellFun Videos.
Example 2.2 Demand Predictor

Excel File Student Action Answers
Demand
Predictor
Change Demand in
Model

Explain Outcome
Blackboard and
Videos
133 Version 1.0 Fall 2013
Discussion 4: Current State of Business Analytics ?
Analysis on Spreadsheets?
Course: BAS 120
Chapter 2 Part 1 of 1
Topics:
Basic Excel Skills.
Excel Functions.
Spreadsheet Add-Ins
& Modeling.
135 Version 1.0 Fall 2013
Excel Basics.
F
u
n
c
t
i
o
n
s

Special Financial
Net Present Value, Statistical, Sum, Average.
Special Lookup and Sorting Tools
Vlookup, Index, Match, Count, Min, Max
Logic Functions
If , and, or.
Countif, sumif
Basic Functions

136 Version 1.0 Fall 2013
If not familiar utilize ExcellFun Videos.
Example 2.3 Purchase Order Data

Excel File Student Action Answers
Purchase Order
Find
smallest and largest quantity
of any item ordered.
total order costs
average number of months
per order for accounts payable
number of purchase orders
placed
number of orders placed for
o-rings.
number of orders with A/P
terms shorter than 30 months
Blackboard and
Videos and at Back
of notes.
137 Version 1.0 Fall 2013
If not familiar utilize ExcellFun Videos.
Example 2.4 Net Present Value

Excel File Student Action Answers
None, student
Creates
Develop excel spreadsheet to
calculate NPV
Blackboard and
Videos and at Back
of notes.
A company is introducing a new product.
The fixed cost for marketing and distribution
is $25,000 and is incurred just prior to
launch. The forecasted net sales revenues
for the first six months are shown at right.
Compute the Net Present Value assuming a
discount rate of 3%.
Net Income from new product
Jan: $2,500
Feb: $4,000
Mar: $6,000
Apr: $8,000
May: $10,000
June: $12,500
138 Version 1.0 Fall 2013
If not familiar utilize ExcellFun Videos.
Example 2.5 If statement

Excel File Student Action Answers
Purchase Orders
Calculate
If purchase order is greater
that or equal to 100,000 then
large other wise it is considered
small.
Separate into Large (greater
than or equal to 100,000)
Median (greater than or equal
to 25,000 but less than
100,000) or Small (less than
25,000)
What other if statement
might you want from this data
Blackboard and
Videos and at Back of
notes.
139 Version 1.0 Fall 2013
Version 1.0 Fall 2013 140
Student Exercise 2.3
Data Set: Presidents Inn Guest Database.

Problem: Data set lists customers, rooms occupied, arrival and departure
dates, number of occupants, and daily rate for a small bed and breakfast inn
during one month. Room rates are the same for one or two guests; however
additional guests must pay an additional $20 per person per day for meals.
Guests staying for seven days or or more receive a 10% discount.

Your Job: Modify the spreadsheet to calculate the number of days that each
party stayed at the inn and the total revenue for the length of stay.
If not familiar utilize ExcellFun Videos.
Example 2.6 VLookup

Excel File Student Action Answers
Sales
Transactions
Calculate
If purchase order is greater
that or equal to 100,000 then
large other wise it is considered
small.
Separate into Large (greater
than or equal to 100,000)
Median (greater than or equal
to 25,000 but less than
100,000) or Small (less than
25,000)
What other if statement
might you want from this data
Blackboard and
Videos and at Back of
notes.
141 Version 1.0 Fall 2013
Version 1.0 Fall 2013 142
Student Exercise 2.6
Data Set: Table Below

Problem: A pharmaceutical manufacturer has projected net profits for a new
drug that is being released in the market over the next 5 years:









Your Job: Use a spreadsheet to find the net present value of these cash flows
for a discount rate of 8%.
Year Net Profit
1 $- 675,000,000
2 $ -445,000,000
3 $ -175,000,000
4 $ 125,000,000
5 $ 530,000,000
Analysis on Spreadsheets?
Course: BAS 120
Chapter 2 Part 1 of 1
Topics:
Basic Excel Skills.
Excel Functions.
Spreadsheet Add-Ins
& Modeling.
143 Version 1.0 Fall 2013
Add Ins .
A
d
d

i
n
s

Basic Add-Ins

Excel Add-ins.
Analysis Tool Pack
Solver
Frontline
Analytics Solver
XL Miner
140 day contract.
144 Version 1.0 Fall 2013
Basics
S
p
r
e
a
d
s
h
e
e
t

E
n
g
i
n
e
e
r
i
n
g

Flexible.
Clear.
Well Documented.
Verification.
Examples in Example Videos.
Understand people are different , there are
normally several ways to do things in Excel.
Perform testing and checking to ensure your
formulas are working correctly.
Basic Spreadsheet Engineering

145 Version 1.0 Fall 2013
If not familiar utilize ExcellFun Videos.
Example 2.8 Outsourcing Decision

Excel File Student Action Answers
Outsource
Decision Model
Develop a model
Utilize the model to
determine if you should
manufacture or outsource at
the following production
volumes:
1500,500, 2500.

Blackboard and
Videos and at Back of
notes.
146 Version 1.0 Fall 2013
If not familiar utilize ExcellFun Videos.
Example 2.9 Pricing Decision Model

Excel File Student Action Answers
Pricing Decision
Model
Develop a model
Determines Total Revenue
Blackboard and
Videos and at Back of
notes.
147 Version 1.0 Fall 2013
If not familiar utilize ExcellFun Videos.
Example 2.10 Income Statement

Excel File Student Action Answers
Net Income
Model
Develop a model
Determines Net Income
Add in Data Model
Develop a Pro Forma
Statement
Blackboard and
Videos and at Back of
notes.
148 Version 1.0 Fall 2013
Version 1.0 Fall 2013 149
Student Exercise 2.10
Data Set: None

Problem: A manufacturer of mp3 players is preparing to set the price on a new
model. Demand is thought to depend on the price and is represented by the
model:
D = 2,000 3P
The accounting department estimates that the total costs can be represented
by
C= 5,000 + 4D

Your Job:
Develop a model with data above. Apply the principles of spreadsheet
engineering in developing you model. Use the spreadsheet to create a table
for a range of prices to help you identify the price that results in the
maximum revenue.
Absolute address
Discount rate
Net present value (discounted cash flow)
Pro forma income statement
Relative address
Spreadsheet engineering
Verification
Version 1.0 Fall 2013
2-150
Chapter 2 - Key Terms
Learning the Basics of Excel
Chapter 2: Learning Objectives
1. Find buttons and menus in Excel ribbon.
2. Write formulas in Excel.
3. Apply relative and absolute addressing.
4. Able to copy functions.
5. Use basic and advanced excel functions.
6. Use Excel features and spreadsheet
engineering to ensure the quality of your
spreadsheets..

151 Version 1.0 Fall 2013
Chapter 2 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013 152
Chapter 2 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013 153
Chapter 2 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013 154
Chapter 2 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013 155
Details: BAS 120 Blackboard site at
www.waktech.edu
BAS 120 Chapter 3: Data
Visualization
156 Version 1.0 Fall 2013
Visualizing and Exploring Data?
Course: BAS 120
Chapter 3 Part 1 of 1
Topics
Data Visualization.
Data Queries and Sorts.
Statistical Methods to
Summarize Data.
Exploring Pivot Tables.

157 Version 1.0 Fall 2013
What is it ?
Communication of
information.
Why its important?
Communication and
knowledge are
important.
Data Visualization?

Communication of the Data.
158 Version 1.0 Fall 2013
Data Visualization

Communication of the Data.
159 Version 1.0 Fall 2013
Data Visualization

Communication of the Data.
70-80%
human sensory
receptors :
visual.
20-30% the
other 4 senses.
160 Version 1.0 Fall 2013
Data Visualization Trends: Info graphics

Communication of the Data.
161 Version 1.0 Fall 2013
Data Visualization Trends: Interactive

Communication of the Data.
162 Version 1.0 Fall 2013
Multiple Options.
C
r
e
a
t
e

E
x
c
e
l

C
h
a
r
t

Powerful set of
options.
Example videos of
how to create
various charts in
excel.
More tools with
each software.
Excel Tools to Communicate Data.

163 Version 1.0 Fall 2013
4 steps.
Steps to Communicating.

1. Goal.
2. Tool.
3. Highlight your Point.
4. Keep it Simple.
164 Version 1.0 Fall 2013
Highlight your key messages.
U
n
d
e
r
s
t
a
n
d


y
o
u
r

G
o
a
l

What 2-3 messages do we
want our audience to
remember.
Why are these messages
important to our audience,
actions to improve results.
What data supports our key
messages.
Steps 1: Your Goal.

165 Version 1.0 Fall 2013
Guidelines for Choosing Appropriate Chart.
B
e
s
t

T
o
o
l
s

Bar or Column Charts:
Independent Variables.
Compare Values.

Pie Charts:
Ratio.
Parts of a Whole.

Line Charts
Trends.
Growth Patterns.


Step 2: Right Tools.

166 Version 1.0 Fall 2013
Guidelines for Choosing Appropriate Chart.
H
i
g
h
l
i
g
h
t

Y
o
u
r

P
o
i
n
t

Step 3: Highlight your points.

167 Version 1.0 Fall 2013
Keep it Simple !
K
e
e
p

S
i
m
p
l
e

Large Fonts.
Easily understood data.
Avoid long sentences.
Detailed backup information available.
Organized flow of the presentation.
Step 4: Keep Simple.

168 Version 1.0 Fall 2013
Visualizing and Exploring Data?
Course: BAS 120
Chapter 3 Part 1 of 1
Topics
Data Visualization.
Data Queries and Sorts.
Statistical Methods to
Summarize Data.
Exploring Pivot Tables.

169 Version 1.0 Fall 2013
Learn by doing example (videos).
D
a
t
a

S
o
r
t
-
P
i
v
o
t

C
h
a
r
t
s

Example Videos on:
Sorting Data.
Pareto Analysis.
80% of profits from 20% of items.
Auto Filter.
Pivot Tables.
Pivot Charts.
Data Sorting.

170 Version 1.0 Fall 2013
Visualizing and Exploring Data?
Course: BAS 120
Chapter 3 Part 1 of 1
Topics
Data Visualization.
Data Queries and Sorts.
Statistical Methods to
Summarize Data.
Exploring Pivot Tables.

171 Version 1.0 Fall 2013
Learn by doing.
Quartiles.

Statistics Methods to Communicate.

Frequency Distributions.

Histogram.

Cumulative Frequencies.

172 Version 1.0 Fall 2013
Visualizing and Exploring Data?
Course: BAS 120
Chapter 3 Part 1 of 1
Topics
Data Visualization.
Data Queries and Sorts.
Statistical Methods to
Summarize Data.
Exploring Pivot Tables.

173 Version 1.0 Fall 2013
Learn by doing example (videos).
P
i
v
o
t

T
a
b
l
e
s

Allows your audience to filter and change how
the data is viewed or sorted.
Pivot Tables.

174 Version 1.0 Fall 2013
Visualizing and Exploring Data?
Course: BAS 120
Chapter 3 Part 1 of 1
Topics
Data Visualization.
Data Queries and Sorts.
Statistical Methods to
Summarize Data.
Exploring Pivot Tables.

175 Version 1.0 Fall 2013
Creating Charts in Microsoft Excel
Select the insert tab.
Highlight the data.
Click on chart type, then subtype.



Use chart tools to customize.

Data Visualization
3-176
Figure 3.1
Figure 3.2
Version 1.0 Fall 2013
Example 3.1 Creating a Column Chart

Data Visualization
3-177
Figure 3.3
Highlighted Cells
Version 1.0 Fall 2013
Example 3.1 (continued) Creating a Column Chart
Choose column chart (clustered or stacked).
Add chart title (Alabama Employment).
Rename Series1, Series2, and Series3
(ALL EMPLOYEES, Men, Women).
Data Visualization
3-178
Figure 3.4
Version 1.0 Fall 2013
Example 3.1 (continued) Creating a Column Chart
Data Visualization
3-179
Figure 3.5
Clustered
Column
Chart
Version 1.0 Fall 2013
Example 3.1 (continued) Creating a Column Chart
Data Visualization
3-180
Figure 3.6
Stacked
Column
Chart
Version 1.0 Fall 2013
Example 3.2 Line Chart for U.S. Exports to China
Data Visualization
3-181
Figure 3.7
Version 1.0 Fall 2013
Example 3.3 Pie Chart for Census Data
Data Visualization
3-182
Figure 3.8
Figure 3.9
Version 1.0 Fall 2013
Example 3.4 Area Chart for Energy Consumption
Data Visualization
3-183
Figure 3.10
Version 1.0 Fall 2013
Example 3.5 Scatter Chart for Real Estate Data
Data Visualization
3-184
Figure 3.11
Version 1.0 Fall 2013
Example 3.6
Bubble Chart for Comparing Stock Characteristics
Data Visualization
3-185
Figure 3.12
Version 1.0 Fall 2013
Version 1.0 Fall 2013 186
Student Exercise 3.3
Data Set: Facebook Survey

Problem: The data set provides data gathered from a sample of college
students. Create a scatter diagram showing the relationship between Hours
online per and week and Friends.

Your Job:
Create the scatter plot and explain.
Miscellaneous Excel Charts
Stock chart
Surface chart
Doughnut chart
Radar chart
Geographic mapping
Data Visualization
3-187 Version 1.0 Fall 2013
Version 1.0 Fall 2013 188
Student Exercise 3.6
Data Set: State Unemployment Rates

Problem: Construct a column chart for the data to provide comparison of the
June rate with the historical high and lows.

Your Job:
Would any other charts be better to visually convey this information?
Example 3.7
Sorting Data in the Purchase Orders Database
3-189
Figure 3.13
Figure 3.14
Sort by Supplier
Data Queries: Using Sorting and Filtering
Version 1.0 Fall 2013
Pareto Analysis
An Italian economist, Vilfredo Pareto, observed in
1906 that a large proportion of the wealth in Italy
was owned by a small proportion of the people.
Similarly, businesses often find that a large
proportion of sales come from a small proportion
of customers.
A Pareto analysis involves sorting data and
calculating cumulative proportions.


3-190
Data Queries: Using Sorting and Filtering
Version 1.0 Fall 2013
Version 1.0 Fall 2013 191
Student Exercise 3.9
Data Set: Automobile Quality

Problem: -

Your Job:
Sort the data from lowest to highest number of problems per 100 vehicles
using the sort capabilities in Excel.
Example 3.8 Applying the Pareto Principle

3-192
Data Queries: Using Sorting and Filtering
Figure 3.15
75% of the bicycle inventory value comes from 40% (9/24) of items.
Sort by
Version 1.0 Fall 2013
Example 3.9 Filtering Records by Item Description
Highlight A3:J97
Data tab
Sort & Filter group
Filter
Click on the D3
dropdown arrow.
Select Bolt-nut
package to filter out
all other items.

3-193
Figure 3.16
Data Queries: Using Sorting and Filtering
Version 1.0 Fall 2013
Example 3.9 (continued)
Filtering Records by Item Description
Filter results for the bolt-nut package

3-194
Figure 3.17
Data Queries: Using Sorting and Filtering
Version 1.0 Fall 2013
Example 3.10 Filtering Records by Item Cost
To identify items that
cost at least $200
Click on dropdown
arrow for item cost
Number Filters
Greater Than Or
Equal To


3-195
Figure 3.18
Data Queries: Using Sorting and Filtering
Version 1.0 Fall 2013
Example 3.10 (continued) Filtering by Item Cost Custom
AutoFilter dialog box
Click OK
Only items
costing at least
$200 is then
displayed.
3-196
Figure 3.19
Data Queries: Using Sorting and Filtering
Version 1.0 Fall 2013
Version 1.0 Fall 2013 197
Student Exercise 3.12
Data Set: Sales Transaction

Problem: -

Your Job:
Filter the data to extract all orders that used PayPal, all orders over $100
and used a credit card.
AutoFilter criteria is based on the data type.
Number Filters includes numerical criteria.
Date Filters include tomorrow, next week, etc.

AutoFilter can be used sequentially.
First filter by one variable.
Then filter those data by another variable.
3-198
Data Queries: Using Sorting and Filtering
Version 1.0 Fall 2013
Analytics in Practice: Discovering Value
of Data Analysis at Alders International

Duty free operations at airports, seaports, etc.
Maintain a data warehouse to track point-of-sale
information and inventory levels.
Pareto analysis revealed that 80% of profits were
generated from 20% of their product lines.
Allows selective elimination of less profitable items.

3-199
Data Queries: Using Sorting and Filtering
Version 1.0 Fall 2013
A statistic is a summary measure of data.
Descriptive statistics are methods that describe and
summarize data.
Microsoft Excel supports statistical analysis in two ways:
1. Statistical functions
2. Analysis Toolpak add-in for PCs
(for Macs, StatPlus is similar)

3-200
Statistical Methods for Summarizing Data
Statistical methods are essential to Business Analytics
Version 1.0 Fall 2013
Example 3.11 Constructing a Frequency Distribution
for Items in the Purchase Order Database
3-201
Statistical Methods for Summarizing Data
Figure 3.20
Copy Column D (Item Description) to Column A in a new worksheet
Version 1.0 Fall 2013
Example 3.11 (continued) Constructing a Frequency
Distribution for Items in the Purchase Order Database
3-202
Statistical Methods for Summarizing Data
Figure 3.22 Figure 3.21
Version 1.0 Fall 2013
Example 3.11 (continued) Constructing a Frequency
Distribution for Items in the Purchase Order Database
3-203
Statistical Methods for Summarizing Data
Figure 3.23
Version 1.0 Fall 2013
Example 3.12 Constructing a Relative Frequency
Distribution for Items Purchased
3-204
Statistical Methods for Summarizing Data
Figure 3.24
Compute relative
frequencies by
dividing each
frequency by 94.
Version 1.0 Fall 2013
Version 1.0 Fall 2013 205
Student Exercise 3.15
Data Set: Sales Transaction

Problem: -

Your Job:
Use the Histogram tool to construct a frequency distribution of sales
amounts.
Example 3.13 Frequency and Relative Frequency Distribution
for A/P Terms
3-206
Statistical Methods for Summarizing Data
Figure 3.26
Figure 3.25
Version 1.0 Fall 2013
Excels Histogram Tool
Using the Analysis Toolpak

Data
Data Analysis
Histogram

Fill in the Input Range and Bin Range (optional).
Choose Labels if columns have headers rows.
Choose Chart Output.
3-207
Statistical Methods for Summarizing Data
Figure 3.27
Version 1.0 Fall 2013
Example 3.14
Using the Histogram Tool for A/P Terms

A/P data in H3:H97

Bins below in H99:H103
Month
15
25
30
45
3-208
Statistical Methods for Summarizing Data
Figure 3.28
Version 1.0 Fall 2013
Example 3.14 (continued)
Using the Histogram Tool for A/P Terms
3-209
Statistical Methods for Summarizing Data
Figure 3.29
Table above is
not linked to
chart.
Version 1.0 Fall 2013
Example 3.15 Constructing a Frequency Distribution and
Histogram for Cost Per Order
3-210
Statistical Methods for Summarizing Data
5 groups with a
$26,000 group width
Figure 3.30
Version 1.0 Fall 2013
Example 3.15 (continued) Constructing a Frequency Distribution
and Histogram for Cost Per Order
3-211
Statistical Methods for Summarizing Data
Figure 3.31
10 groups with a
$13,000 group width
Version 1.0 Fall 2013
Example 3.16 Computing Cumulative Relative Frequencies for
the Cost Per Order Data
3-212
Statistical Methods for Summarizing Data
Ogive
Figure 3.33
Figure 3.32
Version 1.0 Fall 2013
Example 3.17 Computing Percentiles
Compute the 90
th
percentile for cost per order in the
Purchase Orders Data.
Rank of k
th
percentile =
n = 94 observations
k = 90
Rank of 90
th
percentile = 94(90)/100+0.5
= 85.1 (round to 85)
Value of the 85
th
observation = $74,375

3-213
Statistical Methods for Summarizing Data
Version 1.0 Fall 2013
Example 3.18 Computing Percentiles in Excel
Compute the 90
th
percentile for cost per order.
Excel function for the k
th
percentile:
=PERCENTILE.INC(array, k)
=PERCENTILE.INC(G4:G97, 0.90)
= $73,737.50
Excel does not use the formula on previous
slide.
3-214
Statistical Methods for Summarizing Data
Version 1.0 Fall 2013
Example 3.19 Excels Rank and Percentile Tool
Data
Data Analysis
Rank and Percentile

90.3
rd
percentile
= $74,375
(same result as
manually computing
the 90
th
percentile)
3-215
Statistical Methods for Summarizing Data
Figure 3.34
Version 1.0 Fall 2013
Example 3.20 Computing Quartiles in Excel
Compute the Quartiles of the Cost per Order data
Excel function for quartiles:
=QUARTILE.INC(array, quart)
=QUARTILE.INC(G4:G97, 1) = $6,757.81
=QUARTILE.INC(G4:G97, 2) = $15,656.25
=QUARTILE.INC(G4:G97, 3) = $27,593.75
=QUARTILE.INC(G4:G97, 4) = $127,500.00

3-216
Statistical Methods for Summarizing Data
Version 1.0 Fall 2013
Example 3.21 Constructing a Cross-Tabulation
Sales Transactions database





Identify the number (and percentage) of books and DVDs
ordered by region.
3-217
Statistical Methods for Summarizing Data
Figure 3.35
Version 1.0 Fall 2013
Example 3.21 (continued) Constructing a Cross-Tabulation
3-218
Statistical Methods for Summarizing Data
Table 3.1
Table 3.2
Version 1.0 Fall 2013
Example 3.21 (continued) Constructing a Cross-Tabulation







Excels PivotTable (covered next) makes this easy.
3-219
Statistical Methods for Summarizing Data
Figure 3.36
Table 3.1
Version 1.0 Fall 2013
Version 1.0 Fall 2013 220
Student Exercise 3.21
Data Set: Atlanta Airline Data

Problem: -

Your Job:
Find the 10
th
and 90
th
percentiles and the 1
st
and 3
rd
quartiles for the time
differences between the scheduled and actual arrival times.
Data
Tables
PivotTable
Follow wizard steps.
PivotTables allow:
Quick creation of
cross tabulations
Numerous custom-
made summary
tables and charts

3-221
Exploring Data Using PivotTables
Figure 3.37
Version 1.0 Fall 2013
PivotTable Field List
Select the fields for:
Report Filter
Column Labels
Row Labels
Values
Or, before choosing
PivotTable, you can
select a cell in the data
and let Excel prepare a
default PivotTable.
3-222
Exploring Data Using PivotTables
Figure 3.37
Version 1.0 Fall 2013
Example 3.22
Creating a PivotTable
Default PivotTable for
Regional Sales by
Product
(sum of CustID is
meaningless)
3-223
Exploring Data Using PivotTables
Figure 3.38
Version 1.0 Fall 2013
Example 3.22 (continued) Creating a PivotTable
Pivot Table Tools
Options
Active Field
Field Settings
Change summarization
method in Value Field
Settings dialog box
Select Count
3-224
Exploring Data Using PivotTables
Figure 3.39
Version 1.0 Fall 2013
Example 3.22 (continued) Creating a PivotTable
3-225
Exploring Data Using PivotTables
Figure 3.40
Table 3.1
PivotTable for Count of
Regional Sales by
Product


PivotTable results match
those shown earlier in
Table 3.1.
Version 1.0 Fall 2013
Drag Source into the
Row Labels box.

PivotTable for Sales
by Region, Product,
and Order Source

3-226
Exploring Data Using PivotTables
Figure 3.41
Example 3.22 (continued)
Creating a PivotTable
Version 1.0 Fall 2013
Example 3.23
Using the Pivot
Table Report Filter
Drag Payment into
Report Filter box.

PivotTable Filtered by
Payment Type.
3-227
Exploring Data Using PivotTables
Figure 3.42
Version 1.0 Fall 2013
Example 3.23 (continued)
Using the PivotTable Report Filter
Click on the drop-down arrow in row 1.

3-228
Exploring Data Using PivotTables
Figure 3.43
Choose Credit-Card.
Obtain this cross-tabulation
PivotTable for credit card
transactions.
Version 1.0 Fall 2013
Example 3.24 A PivotChart for Sales Data
Create a chart using the PivotTable for
Sales by Region, Product, and Order Source.
Insert
Column Chart
To display only Book
data, click on the
Product button and
deselect DVD.
3-229
Exploring Data Using PivotTables
Figure 3.44
Version 1.0 Fall 2013
Version 1.0 Fall 2013 230
Student Exercise 3.28
Data Set: Sales Transaction

Problem: -

Your Job:
Use Pivot Tables to find the number of sales transactions by product and
region, total amount of revenue by region, and total revenue by region and
product in the above data set.
3-231
Key Terms
Area chart
Bar chart
Bubble chart
Column chart
Contingency table
Cross-tabulation
Cumulative relative
frequency
Cumulative relative
frequency distribution


Data profile (fractile)
Descriptive statistics
Doughnut chart
Frequency distribution
Histogram
k
th
percentile
Line chart
Ogive
Pareto analysis
Pie chart
Version 1.0 Fall 2013
3-232
Chapter 3 - Key Terms (continued)
PivotChart
PivotTable
Quartile
Radar chart
Relative frequency
Relative frequency
distribution
Scatter chart
Statistic
Statistics

Stock chart
Surface chart
Version 1.0 Fall 2013
Learning the Basics of Excel Display of Data
Chapter 3: Learning Objectives
1. Create Microsoft charts.
2. Determine appropriate chart to visualize
different types of data.
3. Sort a data set in Excel spreadsheet.
4. Apply Pareto Principles to analyze data.
5. Use Excel Autofilter to identify records.
6. Construct a frequency distribution and
relative frequency distribution for both
discrete and continuous data.


233 Version 1.0 Fall 2013
Learning the Basics of Excel Display of Data
Chapter 3: Learning Objectives
7. Find percentile and quartiles for a data set.
8. Use pivot tables to explore and summarize
data.
9. Use pivot tables to construct a cross-
tabulation.
10. Display the results of pivot tables using pivot
charts.


234 Version 1.0 Fall 2013
235
Chapter 3 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
236
Chapter 3 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
Details: BAS 120 Blackboard site at
www.waktech.edu
BAS 120 Chapter 4: Descriptive
Statistics
237 Version 1.0 Fall 2013
Descriptive Statistical Measures
Course: BAS 120
Chapter 4
Topics
Population vs. Sample
Measure of Location.
Measures of Dispersion.
Measures of Shape.
Grouped Data.
Measures of Association.
Outliners.
Statistical Thinking.
238 Version 1.0 Fall 2013
Discussion 5: Day in the Life of a Business Analyst ?
Often use samples.
D
e
f
i
n
e

Population: all the items
Sample: subset of the population.

Population vs. Sample
Parameters
Statistics
240 Version 1.0 Fall 2013
Learn the language.
S
t
a
t
i
s
t
i
c
a
l

N
o
t
a
t
i
o
n

Mean or Average. (Location)
Mu : Population Mean.
X Bar : Sample Mean.
Standard Deviation. (Dispersion)
sigma: population standard deviation.
s small s : sample Standard deviation.
Size or Population or Sample
N for Population and n for sample.
Proportions
for Population proportion and p for sample proportion.
Summation for summation.

Notation
241 Version 1.0 Fall 2013
Descriptive Statistical Measures
Course: BAS 120
Chapter 4
Topics
Population vs. Sample
Measure of Location.
Measures of Dispersion.
Measures of Shape.
Grouped Data.
Measures of Association.
Outliners.
Statistical Thinking.
242 Version 1.0 Fall 2013
Where is the location : the center!
M
e
a
s
u
r
e
s

o
f

L
o
c
a
t
i
o
n

To describe data
we are interested
in location: where
is the center? Or
where is most of
our data?

Measures of Location
243 Version 1.0 Fall 2013
Where is our data located.
M
e
a
s
u
r
e
s

o
f

L
o
c
a
t
i
o
n





Key Measure Description Excel
Mean What most of us call
the average.
AVERAGE (data
range)
Median Middle Score MEDIAN (data range)
Mode Most Frequent Score MODE.SNGL (data
range)
Purpose is describe the location of most of our data.
Measures of Location
244 Version 1.0 Fall 2013
Example 4.1 Computing Mean Cost per Order
(Purchase Orders data)
Using formula:
Mean = $2,471,760/94
= $26,295.32
Measures of Location
4-245
Figure 4.1
Version 1.0 Fall 2013
Example 4.1 (continued)
Computing Mean Cost per Order

Applying Formula
=Sum(B2:B95)/Count(B2:B95)

Using Average Function
=Average(B2:B95)


Measures of Location
4-246
Figure 4.2
Version 1.0 Fall 2013
Example 4.2 Finding the Median Cost per Order
(Purchase Orders data)
Median - middle value of the data when arranged from least to
greatest

Measures of Location
4-247
Figure 4.3
Sort the data in column B.
Since n = 94,
Median = $15,656.25
= average of 47
th
and
48
th
observations.

=MEDIAN(B2:B94)
=Average(B48,B49)

Version 1.0 Fall 2013
Example 4.3 Finding the Mode of A/P terms
(Purchase Orders data)
Mode - observation that occurs most often or, for
grouped data, the group with the greatest frequency.
Mode of A/P terms:
= 30 months
=MODE.SNGL(H4:H97)

For multiple modes: =MODE.MULT(data range)

Measures of Location
4-248
Figure 3.29
Version 1.0 Fall 2013
Example 4.4 Computing the Midrange
(Purchase Orders data)
Midrange = Average of greatest and least values
Use the Excel MIN and MAX functions or Sort the
data and find them easily.
Cost per order midrange:
= ($68.78 + $127,500)/2
= $63,784.89
=AVERAGE(MIN(B2:B95), MAX(B2:B95))
Measures of Location
4-249 Version 1.0 Fall 2013
Example 4.5 Quoting Computer Repair Times
Data set includes 250 repair times for customers.
Measures of Location
4-250
Figure 4.4
What repair time would be
reasonable to quote to a new
customer?
Median repair time is 2 weeks;
Mean and Mode are about 15
days.
Lets look at a histogram to get
a better idea.
Version 1.0 Fall 2013
Example 4.5 (continued)
Quoting Computer Repair Times

Measures of Location
4-251
Figure 4.5
90% are completed within 3 weeks
Version 1.0 Fall 2013
Descriptive Statistical Measures
Course: BAS 120
Chapter 4
Topics
Population vs. Sample
Measure of Location.
Measures of Dispersion.
Measures of Shape.
Grouped Data.
Measures of Association.
Outliners.
Statistical Thinking.
252 Version 1.0 Fall 2013
What kind of range or dispersion is in the data?
M
e
a
s
u
r
e
s

o
f

D
i
s
p
e
r
s
i
o
n






Measures of Dispersion
253 Version 1.0 Fall 2013
How much dispersion is in our data.
M
e
a
s
u
r
e
s

o
f

D
i
s
p
e
r
s
i
o
n





Key Measure Description Excel
Range Difference between
the highest score and
the lowest score.
MAX (data range)
MIN (data range)
Standard Deviation How far away from
the average (mean) is
our data.
STDEV.S (data range)
Z score One score (x) Mean
standard deviation
STANDARDIZE (X,
MEAN, STDDEV)
Coefficient of
Variance
Standard Deviation
divided by the mean

STDEV.S (data range)
AVERAGE (data
range)
Purpose is describe the amount of dispersion of
most of our data.
254 Version 1.0 Fall 2013
How much dispersion is in our data.
S
t
a
n
d
a
r
d

D
e
v
i
a
t
i
o
n
:

P
o
p
u
l
a
t
i
o
n

255 Version 1.0 Fall 2013
How much dispersion is in our data.
S
t
a
n
d
a
r
d

D
e
v
i
a
t
i
o
n
:

S
a
m
p
l
e

256 Version 1.0 Fall 2013
Example 4.6 Computing the Range
(Purchase Orders data)
For the cost per order data:
Maximum = $127,500
Minimum = $68.78
Range = $127,431.22
=MAX(B2:B95)MIN(B2:B95)
Measures of Dispersion
4-257 Version 1.0 Fall 2013
Example 4.7 Computing the Interquartile Range
(Purchase Orders data)
For the cost per order data:
Third Quartile = Q
3
= $27,593.75
=QUARTILE.INC(B2:B95,3)
First Quartile = Q
1
= $6,757.81
=QUARTILE.INC(B2:B95,1)
Interquartile Range = $20,835.94
Measures of Dispersion
4-258
The middle 50% of the data is concentrated in a small range of $20,836.
The range of the full data set is affected by extreme values.
Version 1.0 Fall 2013
Example 4.8 Computing the Variance
(Purchase Orders data)







=VAR.S(B2:B95)

Measures of Dispersion
4-259
Figure 4.6
Version 1.0 Fall 2013
Measures of Dispersion
4-260
Figure 4.6
Version 1.0 Fall 2013
Example 4.10 Applying Chebyshevs Theorem
(Purchase Orders data)
For the cost-per-order data
When k = 2, 1-1/k
2
= 75%
Mean 2(Stdev.) = [-$33,390.34, $85,980.98]
89 of the 94 data values (94.68%)
When k = 3, 1-1/k
2
= 89%
Mean 3(Stdev.) = [-$63,233.17, $115,823.81]
92 of the 94 data values (97.9%)


Measures of Dispersion
4-261 Version 1.0 Fall 2013
Example 4.11 Using the Empirical Rule to Measure
the Capability of a Manufacturing Process
C
p
= 0.57
C
p
< 1 indicates
variation is
wider than
specified.
Want C
p
1
or C
p
1.5

Measures of Dispersion
4-262
Figure 4.8
Version 1.0 Fall 2013
Example 4.11 (continued)
3+3+1+1 = 8 of 200 (4%) fall outside the
specification limits of between 4.8 and 5.2.
3
rd
Empirical Rule: approximately 0.3% of the
data falls outside 3 standard deviations of the
mean.
Chebyshevs Theorem: less than 11% fall
outside.
Measures of Dispersion
4-263
Figure 4.9
Version 1.0 Fall 2013
Example 4.12 Computing z-scores
(Purchase Orders data)

=STANDARDIZE(x, mean, standard deviation)


Measures of Dispersion
4-264
Figure 4.10
Version 1.0 Fall 2013
Example 4.13 Applying the Coefficient of Variation
Intel (INTC) is slightly riskier than the other stocks.
The Index fund has the least risk (lowest CV).

Measures of Dispersion
4-265
Figure 4.11
Version 1.0 Fall 2013
Version 1.0 Fall 2013 266
Student Exercise 4.3
Data Set: Tablet Computer Sales

Problem: -

Your Job:
Find the average number, standard deviation and interquartile range of
units sold per week. Show that Chebyshevs theorem holds for the data and
determine how accurate the empirical rules are?
Descriptive Statistical Measures
Course: BAS 120
Chapter 4
Topics
Population vs. Sample
Measure of Location.
Measures of Dispersion.
Measures of Shape.
Grouped Data.
Measures of Association.
Outliners.
Statistical Thinking.
267 Version 1.0 Fall 2013
Skewness
M
e
a
s
u
r
e
s

o
f

S
h
a
p
e






Measures of Shape
268 Version 1.0 Fall 2013
Kurtosis
M
e
a
s
u
r
e
s

o
f

S
h
a
p
e






Measures of Shape
269 Version 1.0 Fall 2013
What is the shape of our data.
M
e
a
s
u
r
e
s

o
f

S
h
a
p
e





Key Measure Description Excel
Kurtosis How flat or tall. CK <
3 are more flat with
wide dispersion. CK >
3 are more tall with
less dispersion
KURT (data range)
Coefficient of
Skewness
Is mean to right or
left of most of the
data. CS positive is
positively skewed.
SKEW(data range)
Purpose is describe the amount of shape of most of
our data.
270 Version 1.0 Fall 2013
Version 1.0 Fall 2013 271
Student Exercise 4.7
Data Set: Colleges and Universities

Problem: -

Your Job:
Compute descriptive statistics for liberal arts colleges and research
universities. Compare the two types of colleges. What can you conclude?
Descriptive Statistical Measures
Course: BAS 120
Chapter 4
Topics
Population vs. Sample
Measure of Location.
Measures of Dispersion.
Measures of Shape.
Grouped Data.
Measures of Association.
Outliners.
Statistical Thinking.
272 Version 1.0 Fall 2013
Grouped Data Special Formulas
G
r
o
u
p
e
d

D
a
t
a

Situation: Raw Data is grouped .
Special formulas to arrive at the mean and
standard deviation. Example.




Grouped Data
273 Version 1.0 Fall 2013
Proportions will have different statistics formulas
C
a
t
e
g
o
r
i
c
a
l

D
a
t
a

Proportion. % of time we are successful.
1- this proportion is % of time we are not successful.
If we had data on the number of on time deliveries.
We could count the number of on time deliveries /
total to arrive at a proportion of on time deliveries.
Or another example: 12 defects out of 1000 sample.
Proportion = 12/ 1000 or 1.2 %

Categorical Data
274 Version 1.0 Fall 2013
Version 1.0 Fall 2013 275
Student Exercise 4.13
Data Set: Bicycle Inventory

Problem: -

Your Job:
Find the proportion of bicycles that sell for less than $200?
Version 1.0 Fall 2013 276
Student Exercise 4.17
Data Set: Sales Transaction

Problem: -

Your Job:
Using Pivot Tables find the average and standard deviation of sales. Also
find the average sales by source (Web or e-mail). Do you think this
information could be useful in advertising?
Descriptive Statistical Measures
Course: BAS 120
Chapter 4
Topics
Population vs. Sample
Measure of Location.
Measures of Dispersion.
Measures of Shape.
Grouped Data.
Measures of Association.
Outliners.
Statistical Thinking.
277 Version 1.0 Fall 2013
What is the association of our data.
M
e
a
s
u
r
e
s

o
f

A
s
s
o
c
i
a
t
i
o
n





Measures of Association
Two or more sets of
data.
Does an association
exist between the two
sets of data.
278 Version 1.0 Fall 2013
What is the association of our data.
M
e
a
s
u
r
e
s

o
f

A
s
s
o
c
i
a
t
i
o
n





Key Measure Description Excel
Covariance The linear association
between 2 variables
COVARIANCE(array1,
array2)
Correlation The linear association
between 2 variables
Which does not
depend on unit of
measure
CORREL(array1,array
2)

Purpose is describe the amount of association oft of
our data.
Measures of Association
279 Version 1.0 Fall 2013
Follow example videos.
E
x
c
e
l

C
o
r
r
e
l
a
t
i
o
n

T
o
o
l

Excel Correlation and Covariance Tools
Measures of Association
280 Version 1.0 Fall 2013
4-281
Example 4.16 Computing Statistical Measures from Frequency
Distributions (Computer Repair Times)

Figure 4.16
Version 1.0 Fall 2013
4-282
Example 4.17 Computing Descriptive Statistics for a Grouped
Frequency Distribution

Figure 4.17
Descriptive Statistics for Grouped Data
We can use group
midpoints as
approximate
percentages of
household income
spent on rent
(except in rows
13, 14).
Version 1.0 Fall 2013
4-283
Example 4.17 (continued)
Our calculations indicate that the typical renter spends about
30% of household income on rent.

Figure 4.18
Descriptive Statistics for Grouped Data
Version 1.0 Fall 2013
Excel Descriptive Statistics Tool
4-284
Example 4.15 Using the Descriptive Statistics Tool

Figure 4.15
Results of the
Analysis Toolpak
do not change
when changes are
made to the data
itself.
Descriptive Statistics for Cost per order and A/P terms

Version 1.0 Fall 2013
Descriptive Statistics for Categorical Data: The
Proportion
4-285
Example 4.18 Computing a Proportion
Proportion of orders placed by Spacetime Technologies
=COUNTIF(A4:A97, Spacetime Technologies)/94
= 12/94 = 0.128

Figure 4.1
Version 1.0 Fall 2013
Measures of Association
4-286
Example 4.20 Computing the Covariance
Scatterplot of the Colleges and Universities data

Figure 4.22
Version 1.0 Fall 2013
Measures of Association
4-287
Example 4.20 (continued)
Computing the Covariance

Figure 4.23
Version 1.0 Fall 2013
Measures of Association
4-288
Figure 4.24
Version 1.0 Fall 2013
Measures of Association
4-289
Example 4.21 Computing the Correlation Coefficient (Colleges
and Universities data)
Graduation % and Median SAT

Figure 4.25
Version 1.0 Fall 2013
Measures of Association
4-290
Excel Correlation Tool
Data
Data Analysis
Correlation

Excel computes the
correlation coefficient
between all pairs of variables in the Input Range.
Input Range Data must be in contiguous columns.

Figure 4.26
Version 1.0 Fall 2013
Measures of Association
4-291
Example 4.22 Using the Correlation Tool
(Colleges and Universities data)




Lower acceptance rate, higher median SAT
Lower acceptance rate, higher % top 10 HS students
Lower acceptance rate, higher graduation rate
Higher median SAT, higher graduation rate

Figure 4.27
Version 1.0 Fall 2013
Version 1.0 Fall 2013 292
Student Exercise 4.19
Data Set: Freshman College Data

Problem: -

Your Job:
Using Pivot Tables to examine differences in student high school
performance and first year retention among different colleges and
universities. What conclusions do you reach?
Descriptive Statistical Measures
Course: BAS 120
Chapter 4
Topics
Population vs. Sample
Measure of Location.
Measures of Dispersion.
Measures of Shape.
Grouped Data.
Measures of Association.
Outliners.
Statistical Thinking.
293 Version 1.0 Fall 2013
Outliners in data.
O
u
t
l
i
n
e
r
s





Outliners
Subjective, no formal
definition.
If more than 3-4
standard deviations
from mean may require
some investigation.
Document.
294 Version 1.0 Fall 2013
Outliers
4-295
The Mean and Range are sensitive to outliers.
How do we identify outliers?
Some possible methods to identify outliers are:
z-scores greater than +3 or less than -3
extreme outliers are more than 3*IQR to the left
of Q
1
or right of Q
3
mild outliers are between 1.5*IQR and 3*IQR to
the left of Q
1
or right of Q
3
There is no standard definition of what constitutes
an outlier.


Version 1.0 Fall 2013
Outliers
4-296
Example 4.23 Investigating Outliers
(Home Market Value data)






Are any homes outliers?

Figure 4.28
Note that the
complete data set has
43 observations.
Version 1.0 Fall 2013
Outliers
4-297
Example 4.23 (continued) Investigating Outliers


Figure 4.29
None of the z-
scores for
Square Feet
or
Market Value
exceed 3.

Version 1.0 Fall 2013
Outliers
4-298
Example 4.23 (continued) Investigating Outliers


Figure 4.30
The house with a
market value near
$120,000 and
square footage
near 1600 does not
fall in line with the
rest of the data.
Version 1.0 Fall 2013
Version 1.0 Fall 2013 299
Student Exercise 4.23
Data Set: Beverage Sales

Problem: -

Your Job:
Compute the covariance and correlation between temperature and sales?
Descriptive Statistical Measures
Course: BAS 120
Chapter 4
Topics
Population vs. Sample
Measure of Location.
Measures of Dispersion.
Measures of Shape.
Grouped Data.
Measures of Association.
Outliners.
Statistical Thinking.
300 Version 1.0 Fall 2013
One Stop Shopping.
E
x
c
e
l

T
o
o
l

Highlight your data, go to data analysis tool and
select descriptive statistics.
Statistical Thinking and Tools
301 Version 1.0 Fall 2013
Statistics in PivotTables
4-302
Statistical Measure Choices in PivotTables
Under Value Field Settings:
Average
Max and Min
Product
Standard deviation
Variance

Figure 4.19
Version 1.0 Fall 2013
Statistics in PivotTables
4-303
Example 4.19 Statistical Measures in PivotTables
(Credit Risk Data)

Figure 4.20
Fields: Checking
Savings
Job Classif.
Row Labels: Job
Values:
Average Checking
Average Savings
Version 1.0 Fall 2013
Statistical Thinking in Business Decisions
4-304
Example 4.24 Applying Statistical Thinking
Average infection rate = 0.0072
Standard deviation = 0.0053

Figure 4.31
Version 1.0 Fall 2013
Statistical Thinking in Business Decisions
4-305
Example 4.24 (continued) Applying Statistical Thinking (Surgery
Infections data)
Control limits set at z-scores of -3 and +3
Control limits: -0.009 (set to 0) and 0.0023

Figure 4.32
Version 1.0 Fall 2013
Statistical Thinking in Business Decisions
4-306
Example 4.25 Variation in Sample Data
Population: 250 computer repair times
= 14.91 days,
2
= 35.5 days
2

Figure 4.33
Two samples of size n = 50
Version 1.0 Fall 2013
Statistical Thinking in Business Decisions
4-307
Example 4.25 (continued) Variation in Sample Data
The two n = 25 samples have higher variation than the
population and the n = 50 samples.

Figure 4.34
Two samples of size n = 25
Version 1.0 Fall 2013
Follow example videos.
E
x
c
e
l

T
o
o
l

Statistical Thinking and Tools
308 Version 1.0 Fall 2013
Version 1.0 Fall 2013 309
Student Exercise 4.27
Data Set: Airport Service Times

Problem: -

Your Job:
Compute the z-scores for the data. How many observations fall farther than
3 standard deviations from the mean. Would you consider these as outliers?
Why or why not?
Follow example videos.
Statistical Thinking and Tools
Statistical thinking is the philosophy of learning and action based
on the following fundamental principles:
all work occurs in a system of interconnected processes - a
process being a chain of activities that turns inputs into
outputs;
variation, which gives rise to uncertainty, exists in all
processes; and
understanding and reducing variation are keys to success.
310 Version 1.0 Fall 2013
Follow example videos.
Statistical Thinking and Tools
All three principles work together to create the power of
statistical thinking.

The definition highlights several key components:
process thinking;
understanding and managing uncertainty;
and using data whenever possible to guide actions and
improve decision-making.

311 Version 1.0 Fall 2013
Modern Portfolio Management Theory.
Statistical Thinking Example
312 Version 1.0 Fall 2013
Process Control Charts.
Statistical Thinking Example
313 Version 1.0 Fall 2013
Version 1.0 Fall 2013 4-314
Chapter 4 - Key Terms
Arithmetic mean (mean)
Bimodal
Chebyshevs theorem
Coefficient of kurtosis
Coefficient of skewness
Coefficient of variation
Correlation
Correlation coefficient
(Pearson product
moment correlation
coefficient)


Covariance
Dispersion
Empirical rules
Interquartile range
(midspread)
Kurtosis
Median
Midrange
Mode
Outlier
Version 1.0 Fall 2013 4-315
Chapter 4 - Key Terms (continued)
Population
Process capability
index
Proportion
Range
Return to risk
Sample
Sample correlation
coefficient
Skewness

Standard deviation
Standardized value (z-
score)
Statistical thinking
Unimodal
Variance
Learning the Ways to Describe Data
Chapter 4: Learning Objectives
1. Explain the difference between population and a
sample.
2. Understand statistical notation.
3. Use measures of location to better understand
data and support business decisions.
4. Use measures of dispersion such as variance,
standard deviation, standard values (z score) to
better understand data and support business
decisions.
5. Explain the nature of shape : skewness and
kutosis in distribution.
316 Version 1.0 Fall 2013
317
Chapter 4 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
318
Chapter 4 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
319
Chapter 4 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
Statistical Thinking in Business Decisions
4-320
Analytics in Practice:
Applying Statistical Thinking to
Detecting Financial Problems
Sarbanes-Oxley Act (2002)
helped improve the quality of data that companies
disclose to the public but companies can still commit
financial fraud.
Anomaly detection scores (a form of z-score) are often
used by the SEC to detect companies committing
financial fraud.
Version 1.0 Fall 2013
Details: BAS 120 Blackboard site at
www.waktech.edu
BAS 120 Chapter 5: Probability
Distributions and Data Modeling
321 Version 1.0 Fall 2013
Discussion 6: How we used data to win ?
Topics:
Basic Concepts of Probability.
Defining Probability.
Events and Complements.
Mutually Exclusive.
Conditional Probability.
Multiplication Law.
Independent Events
Probability Distributions.
Random Sampling
Learning the Basics of Probability Distributions
Probability Distribution & Data Modeling
Course: BAS 120
Chapter 5
323 Version 1.0 Fall 2013
Probability : Likelihood of an
event or outcome occurs.
From 0 to 1 or 0% to 100%.
Experiment: Process that results
in an outcome.
Outcome: Observed result.
Sample Space: collection of all
possible outcomes.
Event: collection of one or more
outcomes from sample space.
A Few Terms
Learning the Basics of Probability Distributions
324 Version 1.0 Fall 2013
Probability : Likelihood of
getting a 6.
Experiment: Process of rolling
the dice once.
Outcome: Observed result.
Sample Space: 1,2,3,4,5,6.
Event: Getting a 6.
Getting a 6 on one role of a Dice: Terms
Learning the Basics of Probability Distributions
325 Version 1.0 Fall 2013


Describing Probability
Learning the Basics of Probability Distributions
326 Version 1.0 Fall 2013
Classical Definition of Probability
Learning the Basics of Probability Distributions
Total Succesful Outcomes
Total Possible outcomes
Probability of rolling a 6 on roll
of the dice :
1
6
Limitations:
Sometimes we
dont know all the
possible outcomes.
327 Version 1.0 Fall 2013
Relative Frequency: Empirical Data
Learning the Basics of Probability Distributions
Repair Time.
Estimate via the data on
repair time.
Total Succesful Outcomes
Total Possible outcomes
Limitations:
Only as good as our data and also the Data may change
: probability estimate may change.
328 Version 1.0 Fall 2013
Subjective Probability: Experts.
Learning the Basics of Probability Distributions
Super Bowl Outcome.
Stock Market Growth.
Estimates from Experts.
Total Succesful Outcomes
Total Possible outcomes
Limitations:
Only as good as your experts.
329 Version 1.0 Fall 2013
Probability Definitions
Learning the Basics of Probability Distributions
Total Succesful Outcomes
Total Possible outcomes
Definition Basis Limitations
Classical All outcomes. Often we dont know.
Relative Empirical Data. Only as good as our data.
Subjective Expert Opinions. Only as good at experts.
330 Version 1.0 Fall 2013
Topics:
Basic Concepts of Probability.
Defining Probability.
Events and Complements.
Mutually Exclusive.
Conditional Probability.
Multiplication Law.
Independent Events
Probability Distributions.
Random Sampling
Learning the Basics of Probability Distributions
Probability Distribution & Data Modeling
Course: BAS 120
Chapter 5
331 Version 1.0 Fall 2013
Probability Rules
Learning the Basics of Probability Distributions
Rule 1: Probability of event is sum of the P the outcomes
that comprise the event.

P (5 or 6) = P (5) + P (6)

Total Succesful Outcomes
Total Possible outcomes
332 Version 1.0 Fall 2013
Probability Rules : Complement
Learning the Basics of Probability Distributions
Total Succesful Outcomes
Total Possible outcomes
Rule 2: Probability of the complement of an event is
1- Probability of the event.
P ( 5 or 6) = 1/6 + 1/6 = 2/6
P (not rolling a 5 or 6) = 1 - 2/6 = 4/6
333 Version 1.0 Fall 2013
Topics:
Basic Concepts of Probability.
Defining Probability.
Events and Complements.
Mutually Exclusive.
Conditional Probability.
Multiplication Law.
Independent Events
Probability Distributions.
Random Sampling
Learning the Basics of Probability Distributions
Probability Distribution & Data Modeling
Course: BAS 120
Chapter 5
334 Version 1.0 Fall 2013
Mutually Exclusive
Learning the Basics of Probability Distributions
Mutually Exclusive: two events are mutually exclusive if they have no
outcomes in common.
Rule 3: If events A and B are mutually exclusive then
P (A or B) = P (A) + P (B)
P (head or tails) =
P (heads) =
+ P (tails) =
= 1

335 Version 1.0 Fall 2013
Mutually Exclusive
Learning the Basics of Probability Distributions
P (odd or 6) = Yes Mutually
Exclusive.
P (odd) = 18/36
+ P (6) = 5/36
= 23/36

Sum of Rolling a Dice Twice.
336 Version 1.0 Fall 2013
Non - Mutually Exclusive
Learning the Basics of Probability Distributions
Probability of a student like Math or Science.
60 % 55 % 115 %
337 Version 1.0 Fall 2013
Non - Mutually Exclusive
Learning the Basics of Probability Distributions
60 % 55 % 40 %
Like Both
75%
Probability of a student like Math or Science.
338 Version 1.0 Fall 2013
Non - Mutually Exclusive
Learning the Basics of Probability Distributions
Rule 4: Non- Mutually Exclusive:

P (A or B) = P (A) + P (B) - P(A and B)
339 Version 1.0 Fall 2013
Non Mutually Exclusive
Learning the Basics of Probability Distributions
P (odd or 7) = No not
Mutually Exclusive (7 is both
odd and 7)
P (odd) = 18/36
+ P (7) = 6/36
- P (7 and odd) = 6/36
= 18/36
Sum of Rolling a Dice Twice.
340 Version 1.0 Fall 2013
Topics:
Basic Concepts of Probability.
Defining Probability.
Events and Complements.
Mutually Exclusive.
Conditional Probability.
Multiplication Law.
Independent Events
Probability Distributions.
Random Sampling
Learning the Basics of Probability Distributions
Probability Distribution & Data Modeling
Course: BAS 120
Chapter 5
341 Version 1.0 Fall 2013
Conditional Probability
Learning the Basics of Probability Distributions
We know a prior outcome (condition):
Card game such as poker we know we are dealt two
aces. P (winning hand goes way up).
Business: P (buying a new car this year) under the
condition or given that my existing car is 5 years old
goes way up.
Business: P (buying a gutter guard) may go way up
under the condition that its an older home or in an
upscale area.
342 Version 1.0 Fall 2013
Conditional Probability
Learning the Basics of Probability Distributions
< 1 year >= 1 year but <= 5 years Over 5 years Total
Purchased New Car in last year 25 50 200 275
Did Not Purchase Car in last Year 250 200 150 600
Total 275 250 350 875
Event A: Purchase a new car in last year 31.43% (275 /875)
Event B: Over 5 year old 40.00% (350/875)
P (A and B) 22.86% (200/875)
P (A given B) 57.14% (200/350)
Formula 57.14% P (A and B) / P (B)
Age of Car
343 Version 1.0 Fall 2013
Conditional Probability
Learning the Basics of Probability Distributions
Discussion Board this week in
on Conditional probability the
Monty Hall Problem.
344 Version 1.0 Fall 2013
Topics:
Basic Concepts of Probability.
Defining Probability.
Events and Complements.
Mutually Exclusive.
Conditional Probability.
Multiplication Law.
Independent Events
Probability Distributions.
Random Sampling
Learning the Basics of Probability Distributions
Probability Distribution & Data Modeling
Course: BAS 120
Chapter 5
345 Version 1.0 Fall 2013
Multiplication Rule
Learning the Basics of Probability Distributions
P of two events given the two events are independent

P (A and B) = P (A) * P (B)

Example:
P (I will win lottery week 1 and I will win lottery week 2)

P (I will win lottery week 1) = 1/100,000
P (I will win lottery week 2) = 1/75,000
P (I win both weeks) = 1/7,500,000,000
346 Version 1.0 Fall 2013
Multiplication Rule
Learning the Basics of Probability Distributions
Assume a 90% foul shooter.

P (person makes 5 in row)

.9 * .9 * .9 * .9 * .9 = .59 or 59%

Or ..9 ^ 5.

P (person make 25 in row)

.9 ^ 25 = .072 or about 7%
347 Version 1.0 Fall 2013
Topics:
Basic Concepts of Probability.
Defining Probability.
Events and Complements.
Mutually Exclusive.
Conditional Probability.
Multiplication Law.
Independent Events.
Probability Distributions.
Random Sampling
Learning the Basics of Probability Distributions
Probability Distribution & Data Modeling
Course: BAS 120
Chapter 5
348 Version 1.0 Fall 2013
Independent Events
Learning the Basics of Probability Distributions
Two events are independent if P(A/B) = P (A).

No influence, the fact that event B has happened does
not change the P of event A.

Examples:
Flip of a coin.
Roll of a dice.
Foul shots (in theory).
349 Version 1.0 Fall 2013
Probability Distribution and Data
Modeling
Course: BAS 120
Chapter 5
Topics:
Basic Concepts of Probability.
Probability Distributions.
Random Variable.
Discrete vs. Continuous.
Probability Distribution.
Expected Value.
Variance of a Random Variable.
Binomial, Poisson, Normal, More.
Excel Tools for Statistics.
Random Sampling
350 Version 1.0 Fall 2013
Random Variable
Learning the Basics of Probability Distributions
Discrete : can be counted : dice roll, coin flips.
Continuous: time or rate of return.

A random variable is a variable (typically represented
by x) that has a single numerical value that is determined
by chance.
351 Version 1.0 Fall 2013
Data vs. Variable
Learning the Basics of Probability Distributions
Experiment of a coin flip 2 times.
# Heads % of Data
0 21.0%
1 49.0%
2 30.0%
Data of 100 Coin Flipped Twice
352 Version 1.0 Fall 2013
Data and Random Variables
Learning the Basics of Probability Distributions
Experiment of a coin flip 2 times.
# Heads % of Data
0 25.0%
1 50.0%
2 25.0%
Variable of 100 Coin Flipped Twice
353 Version 1.0 Fall 2013
Data and Random Variables
Learning the Basics of Probability Distributions
Experiment of a coin flip 2 times.
# Heads % of Data
0 25.0%
1 50.0%
2 25.0%
Variable of 100 Coin Flipped Twice
# Heads % of Data
0 21.0%
1 49.0%
2 30.0%
Data of 100 Coin Flipped Twice
354 Version 1.0 Fall 2013
Probability Distribution and Data
Modeling
Course: BAS 120
Chapter 5
Topics:
Basic Concepts of Probability.
Probability Distributions.
Random Variable.
Discrete vs. Continuous.
Probability Distribution.
Expected Value.
Variance of a Random Variable.
Binomial, Poisson, Normal, More.
Excel Tools for Statistics.
Random Sampling
355 Version 1.0 Fall 2013
Probability Distribution
Learning the Basics of Probability Distributions
A probability distribution is a graph, table, or formula that gives
the probability for each value of the random variable.
Several Types
Binomial
Poisson
Normal
Exponential
356 Version 1.0 Fall 2013
Expected Value
Learning the Basics of Probability Distributions
The expected value of a random variable corresponds to
the notion of the mean or average for the sample.
If you were going to estimate
(expected value) of what your
golf score (random variable)
was for any given day , how
would you do it?
357 Version 1.0 Fall 2013
Expected Value
Learning the Basics of Probability Distributions
The expected value of a random variable corresponds to
the notion of the mean or average for the sample.
358 Version 1.0 Fall 2013
Expected Value
Learning the Basics of Probability Distributions
The expected value is used in business often.
Hotel, Airlines, Reservations . How
many cancellations do we expect (EV).
Charitable Raffles, Lotteries.
359 Version 1.0 Fall 2013
Probability Distribution: Variance
Learning the Basics of Probability Distributions
Mean is our Expected Value, however we need to know
how much variance exists in our distribution?
Example in Detail: In summary once we know the mean,
we just measure how far away from the mean each of
our outcomes is and basically take an average to find
the variance of the distribution.
360 Version 1.0 Fall 2013
Probability Distribution: Variance
Learning the Basics of Probability Distributions
EV or Expected Value
Divide by N to get
and average or we
could multiple by the
probability of the
event f(x), which is
another way to get
an average.
361 Version 1.0 Fall 2013
Standard Score : Z Score Normal
Learning the Basics of Probability Distributions
Mean, Variance, Standard Deviation, calculate a standard score.
362 Version 1.0 Fall 2013
Numerous Probability Distributions
Learning the Basics of Probability Distributions
Distribution Summary Excel
Binomial
(Discrete)
describes the number of successes in a series of
independent Yes/No experiments all with the same
probability of success.
Binomdist
Poisson describes the number of successes in a series of
independent Yes/No experiments with different
success probabilities.
PoissonDist
Normal
(Contin.)
Know Pop Mean & SD. Central Limit Theroem NormDist
Dist: Know an outcome wish to know the probability.
INV: Know the probability wish to know the outcome.
363 Version 1.0 Fall 2013
Probability Distribution and Data
Modeling
Course: BAS 120
Chapter 5
Topics:
Basic Concepts of Probability.
Probability Distributions.
Random Sampling.
Random Numbers.
Simulation with Analytic Solver.
Data Modeling in Business.
Analytic Solver to Best Fit Data.
364 Version 1.0 Fall 2013
Example 5.2
Relative Frequency Definition of Probability
Probability a computer is repaired in 10 days
= 0.076
Version 1.0 Fall 2013 5-365
Basic Concepts of Probability
Figure 5.1
Example 5.3
Computing the Probability of an Event
Consider the events:
Rolling 7 or 11 on two dice
Probability = 6/36 + 2/36 = 8/36.
Repair a computer in 7 days or less
Probability =
= O
1
+ O
2
+ O
3
+ O
4
+ O
5
+ O
6
+ O
7

= 0 + 0 + 0 + 0 + .004 + .008 + .002
= 0.032

Version 1.0 Fall 2013 5-366
Basic Concepts of Probability
From Figure 5.1
Example 5.4 Computing the Probability of the Complement of
an Event
A
c
, the complement of A, consists of all outcomes in the
sample space not in A.
Dice example:
A = {7, 11}
P(A) = 8/36
A
c
= {2, 3, 4, 5, 6, 8, 9, 10, 12}
P(A
c
) = 1 8/36 = 28/36
Version 1.0 Fall 2013 5-367
Basic Concepts of Probability
Version 1.0 Fall 2013 368
Student Exercise 5.3
Data Set: None

Problem: Three coins are dropped on a table.

Your Job:
List all possible outcomes in the sample set?
Find the probability associated with each outcome?
Example 5.5 Computing the Probability of Mutually Exclusive
Events
Mutually exclusive events have no outcomes in common.
Dice Example:
A = {7, 11}
B = {2, 3, 12}
P(A or B) = UNION of events A and B
= P(A) + P(B)
= 8/36 + 4/36 = 12/36
Version 1.0 Fall 2013 5-369
Basic Concepts of Probability
Example 5.6 Computing the Probability of Non-
Mutually Exclusive Events

Dice Example:
A = {2, 3, 12}
B = {even number}
P(A or B) = UNION of events A and B
= P(A) + P(B) P(A and B)
= 4/36 + 18/36 2/36
= 20/36

Version 1.0 Fall 2013 5-370
Basic Concepts of Probability
Example 5.7 Conditional Probability in Marketing
The Data shows the first and
second purchases for a
sample of 200 customers.
Probability of purchasing an
iPad given already purchased
an iMac = 2/13
Version 1.0 Fall 2013 5-371
Basic Concepts of Probability
Figure 5.2
Figure 5.3
Version 1.0 Fall 2013 372
Student Exercise 5.6
Data Set: None

Problem: Three coins are dropped on a table.

Your Job:
Let A be the event exactly 2 heads. Find P (A).
Let B be the event at most 1 head. Find P (B).
Let C be the event at least 2 heads. Find P (C).
Are the events A and B mutually exclusive. Find P (A or B).
Are the events A and C mutually exclusive. Find P (A or C).

Example 5.8 Computing a Conditional Probability in a Cross-
Tabulation
Probability of preferring Brand 1 given that a respondent is
male = 25/63
Version 1.0 Fall 2013 5-373
Basic Concepts of Probability
Figure 5.4
Example 5.9
Using the Conditional Probability Formula
Probability of A given B:
P(B
1
|M) = P(B
1
and M)/ P(M)
= (25/100)/(63/100)
= 25/63 = 0.397
Version 1.0 Fall 2013 5-374
Basic Concepts of Probability
Summary of conditional probabilities:
Example 5.10
Using the Multiplication Law of Probability

Texas Hold Em Poker Game
Probability of pocket aces (two aces in hand):
P(Ace on first card and Ace on second card)
= P(A
1
and A
2
)
= P(A
2
|A
1
) P(A
1
)
= (3/51) (4/52)
= 0.004525
Version 1.0 Fall 2013 5-375
Basic Concepts of Probability
Example 5.11 Determining if Two Events are
Independent
Are Gender and Brand Preference Independent?




Is P(B
1
|M) = P(B
1
)?
0.397 .34
Gender and Brand Preference are Dependent.
Version 1.0 Fall 2013 5-376
Basic Concepts of Probability
Version 1.0 Fall 2013 377
Student Exercise 5.9
Data Set: None

Problem: A survey of 100 recent college graduates found that 50 owed only
mutual funds , 35 owed only stocks, and 15 owned both.

Your Job:
What is the probability that an individual owns a stock? A mutual fund?
What is the probability that an individual owns neither?
What is the probability that an individual owns either a stock or a mutual
fund?


Example 5.12
Using the Multiplication Law for Independent Events

Dice Roll Example:
Rolling pairs of dice are independent events since they
do not depend on the previous rolls.
A = {roll a sum of 6 on first pair die rolls}
B = {roll a sum of 2, 3, or 12 on second pair rolls}
P(A and B) = P(A) P(B)
= (5/36) (4/36) = 0.0154
Version 1.0 Fall 2013 5-378
Basic Concepts of Probability
Example 5.13
Discrete and Continuous Random Variables
Examples of Discrete Variables:
outcomes of dice rolls
whether a customer likes or dislikes a product
number of hits on a Web site link today
Examples of Continuous Variables:
weekly change in DJIA
daily temperature
time between machine failures
Version 1.0 Fall 2013 5-379
Random Variables and
Probability Distributions
Version 1.0 Fall 2013 380
Student Exercise 5.12
Data Set: Census Education Data

Problem: Using the Civilian Labor Force data find the following.

Your Job:
P (unemployed and advanced degree)
P (not a high school grad unemployed)
Are the events unemployed and at least a high school graduate
independent?

Example 5.14
Probability Distribution of Dice Rolls

Version 1.0 Fall 2013 5-381
Random Variables and
Probability Distributions
Figure 5.5
Example 5.15 A Subjective Probability Distribution
Distribution of an experts assessment of how the DJIA might
change next year.
Version 1.0 Fall 2013 5-382
Random Variables and
Probability Distributions
Figure 5.6
Example 5.16
Probability Mass Function for Rolling Two Dice
f(x
2
) = 1/36
f(x
3
) = 2/36
f(x
4
) = 3/36
f(x
5
) = 4/36
f(x
6
) = 5/36
:
f(x
12
) = 1/36

Version 1.0 Fall 2013 5-383
Discrete Probability Distributions
Figure 5.5
Example 5.17
Using the Cumulative Distribution Function
Probability of rolling between 4 and 8:
= P(4 X 8)
= P(3 < X 8)
= F(x
8
) F(x
3
)
=13/18 1/12
= 23/36
Version 1.0 Fall 2013 5-384
Discrete Probability Distributions
Figure 5.7
Example 5.18 Computing the Expected Value
of the sum of values on 2 die rolls



E[X] = 2(1/36) + 3(1/18) +
12(1/36) = 7
Version 1.0 Fall 2013 5-385
Discrete Probability Distributions
Figure 5.8
Version 1.0 Fall 2013 386
Student Exercise 5.19
Data Set: None

Problem: The weekly demand of a slow-moving product has the following
probability mass function.









Your Job:
Find the expected value, variance, and standard deviation of weekly
demand.
Demand, x Probability , f(x)
0 .1
1 .3
2 .4
3 .2
4 or more 0
Example 5.19 Expected Value on Television
Apprentice example
Teams were required to select an artist
(mainstream or avant-garde) and sell their art for
the most money possible.
Deal or No Deal example
Contestant had 5 briefcases left with $100, $400,
$1000, $50,000 or $300,000 in them.
Expected value of briefcases is $70,300.
Banker offered contestant $80,000 to quit.
Version 1.0 Fall 2013 5-387
Discrete Probability Distributions
Example 5.20 Expected Value of Charitable Raffle
Cost of raffle ticket is $50
1000 raffle tickets were sold.
Prize for winning raffle is $25,000




E[X] = $25
Version 1.0 Fall 2013 5-388
Discrete Probability Distributions
Example 5.21 Airline Revenue Management
Full and discount airfares are available for a flight.
Full-fare ticket costs $560
Discount ticket costs $400
X = ticket price paid
p = 0.75 (the probability of selling a full-fare ticket)
E[X] = 0.75($560) + 0.25(0) = $420
The airline should not discount full-fare tickets because the
expected value of a full-fare ticket is greater than the cost
of a discount ticket.
Break-even point: $400 = p($560) or p = 0.714


Version 1.0 Fall 2013 5-389
Discrete Probability Distributions
Example 5.22
Computing the Variance of a Random Variable

Version 1.0 Fall 2013 5-390
Discrete Probability Distributions
Figure 5.9
Example 5.23 Using the Bernoulli Distribution
Model whether an individual responds positively to
a telemarketing promotion.
You have a box with 20 red and 80 white marbles.
You ask individuals exposed to the telemarketing
promotion to select a marble and then replace it.
If the customer selects a red marble, the
customer makes a purchase.
If the customer selects a white marble, the
customer does not make a purchase.

Version 1.0 Fall 2013 5-391
Discrete Probability Distributions
Example 5.24 Computing Binomial Probabilities
Suppose 10 individuals receive the
telemarking promotion.
Each individual has a 0.2 probability of making
a purchase.
Find the probability that exactly 3 of the 10
individuals make a purchase.
Version 1.0 Fall 2013 5-392
Discrete Probability Distributions
Example 5.25
Using Excels Binomial Distribution Function

P(x = 3) = 0.20133
= f(3)
=BINOM.DIST(3, 10, 0.2, true)
P(x 3) = 0.87913
= F(3)
=BINOM.DIST(3, 10, 0.2, false)

Version 1.0 Fall 2013 5-393
Discrete Probability Distributions
Figure 5.10
True: F(x)
False: f(x)
Histogram Example of the Binomial Distribution

Version 1.0 Fall 2013 5-394
Discrete Probability Distributions
Figure 5.11a
Symmetric when p = 0.5
Histogram Examples of the Binomial Distribution

Version 1.0 Fall 2013 5-395
Discrete Probability Distributions
Figure 5.11b
Positively skewed when p < 0.5
Negatively skewed when p < 0.5
Figure 5.11c
Poisson Distribution
Models the number of occurrences in some unit of measure
(often time or distance).
There is no limit on the number of occurrences.
The average number of occurrence per unit is a constant
denoted as .
Version 1.0 Fall 2013 5-396
Discrete Probability Distributions
Example 5.26 Computing Poisson Probabilities
Suppose the average number of customers
arriving at a Subway restaurant during lunch hour
is =12 per hour.
The probability that exactly x customers arrive
during the hour is given by the Poisson
distribution.
Find the probability that exactly 5 arrive during
lunch hour:
f(5) = e
-12
(12
5
)/5!
= (0.000006144)(248,832)/120
= 0.1274
Version 1.0 Fall 2013 5-397
Discrete Probability Distributions
Example 5.27 Using Excels
Poisson Distribution Function
POISSON.DIST(x, mean, cumulative)
Version 1.0 Fall 2013 5-398
Discrete Probability Distributions
Figure 5.12
Figure 5.13
True: F(x)
False: f(x)
Version 1.0 Fall 2013 399
Student Exercise 5.21
Data Set: Consumer Transportation Survey

Problem:

Your Job:
Based on the data develop a probability mass function and cumulative
distribution function (both tabular and as charts) for the random variable
Number of Children. What is the probability that an individual in this survey
has fewer than 2 children? At least 2 children? Fire or more children?
Analytics in Practice: Using the Poisson Distribution for
Modeling Bids on Priceline
Pricing strategies for Kimpton
hotels on Priceline is modeled
using a Poisson distribution.
The number of bids placed
per day 3 days before arrival
is f(x) = e
-6.3
(6.3
x
)/x! .
Using the model increased
sales 11% in one year.

Version 1.0 Fall 2013 5-400
Discrete Probability Distributions
Example 5.28
Computing Uniform Probabilities
Sales revenue for a product varies uniformly
each week between $1000 and $2000.
f(x) = 1/(2000-1000)
= 1/1000
Version 1.0 Fall 2013 5-401
Continuous Probability Distributions
Figure 5.15
Area = 1
Example 5.28 (continued)
Computing Uniform Probabilities
Find the probability sales revenue will be less
than $1,300.
P(X < 1300) = (1300-1000)(1/1000) = 0.30
Version 1.0 Fall 2013 5-402
Continuous Probability Distributions
Figure 5.16
Example 5.28 (continued) Uniform Probabilities
Find the probability that revenue will be between
$1,500 and $1,700.



P(1500 X 1700) = P(X 1700) P(X 1500)
= F(1700) F(1500)
= 300/1000 500/1000
=0.20
Version 1.0 Fall 2013 5-403
Continuous Probability Distributions
Figure 5.17

=NORM.DIST(x, mean, stdev, cumulative)

Version 1.0 Fall 2013 5-404
Continuous Probability Distributions
Figure 5.18
True: F(x)
False: f(x)
Version 1.0 Fall 2013 405
Student Exercise 5.26
Data Set: None

Problem:

Your Job:
A popular resort hotel has 300 rooms and is usually fully booked. About 4% of
the time a reservation is canceled before the 6pm deadline with no penalty.
What is the probability that at least 280 rooms will be occupied? Use the
binomial distribution to find the exact value and the normal approximation to
the binomial and compare to your answer.
Example 5.29 Using NORM.DIST to Compute
Normal Probabilities
The distribution for customer demand (units per
month) is normal with:
mean = 750
stdev. = 100
Find the probability that demand will be:
a) at most 900 units/month
b) exceed 700 units/month
c) be between 700 and 900 units/month
Version 1.0 Fall 2013 5-406
Continuous Probability Distributions
Example 5.29 (continued) Using
NORM.DIST to Compute
Normal Probabilities
Cumulative probabilities
are computed as:
=NORM.DIST(x, 750, 100, true)

a) P(X < 900) = 0.9332
b) P(X > 700) = 10.3085 = 0.6915
c) P(700 < X < 900) = 0.93320.3085
= 0.6247
Version 1.0 Fall 2013 5-407
Continuous Probability Distributions
Figure 5.19
Version 1.0 Fall 2013 408
Student Exercise 5.28
Data Set: None

Problem:

Your Job:
A finance consultant has an average of 6 customers he consults with each day,
which are assumed to be Poisson distributed. The consultants overhead
requires that he consult with at least 5 customers in order that fees cover
expenses. Find the probability of 0-4 customers in a given day. What is the
probability that at least 5 customers will schedule his services?
Example 5.29 (continued) Using NORM.DIST to
Compute Normal Probabilities
Version 1.0 Fall 2013 5-409
Continuous Probability Distributions
Figure 5.20a
0.9332
Example 5.29 (continued) Using NORM.DIST to
Compute Normal Probabilities
Version 1.0 Fall 2013 5-410
Continuous Probability Distributions
Figure 5.20b
0.6915
Example 5.29 (continued) Using NORM.DIST to
Compute Normal Probabilities

Version 1.0 Fall 2013 5-411
Continuous Probability Distributions
Figure 5.20c
0.6247
Example 5.30 Using the NORM.INV Function
=NORM.INV(probability, mean, stdev)
provides the x value with F(x)= probability

What level of demand would be exceeded at
most 10% of the time?
Find x such that F(x) = 90%
= NORM.INV(0.90, 750, 100)
results in x = 878.155

Version 1.0 Fall 2013 5-412
Continuous Probability Distributions
Example 5.30 (continued)
Using the NORM.INV Function

Version 1.0 Fall 2013 5-413
Continuous Probability Distributions
Figure 5.20d
878.155
90%
Standard Normal Distribution
Z is the standard normal random variable with:
Mean = 0
Stdev = 1
Version 1.0 Fall 2013 5-414
Continuous Probability Distributions
Figure 5.21
Example 5.31 Computing Probabilities with the Standard
Normal Distribution
Verify the empirical rules using Excel.
P(-1 < Z < 1 )
= NORMS.DIST(1) NORMS.DIST(-1)
= 0.84134 0.15866
= 0.6827
~ 68%
Version 1.0 Fall 2013 5-415
Continuous Probability Distributions
Figure 5.22
Example 5.32 Computing Probabilities with the Standard
Normal Tables
From Example 5.29, what is the probability that
demand will be at least 900 units/month?
Use the equation:
Z = (900 750)/100
= 1.50
Using Table 1 in Appendix B, we find:
P(X < 900) = P(Z < 1.50)
= 0.93319
Version 1.0 Fall 2013 5-416
Continuous Probability Distributions
Exponential Distribution
Models the time between randomly occurring
events (arrivals, machine failures, etc.)




where is the mean
rate of occurrences
(from the discrete
Poisson distribution)

Version 1.0 Fall 2013 5-417
Continuous Probability Distributions
Figure 5.23
with =1
Version 1.0 Fall 2013 418
Student Exercise 5.43
Data Set: None

Problem:

Your Job:
Use the Excel Random Number Generator tool to generate 100 samples of
number of customers that the financial consultant in problem 5.28 will have on
a daily basis. What percentage will meet his target of least 5?

Your Job: 5.28 FYI
A finance consultant has an average of 6 customers he consults with each day,
which are assumed to be Poisson distributed. The consultants overhead
requires that he consult with at least 5 customers in order that fees cover
expenses. Find the probability of 0-4 customers in a given day. What is the
probability that at least 5 customers will schedule his services?

Example 5.33 Using the Exponential Distribution
The mean time to failure of a critical engine
component is = 8,000 hours.
What is the probability of failing before 5000
hours?
P(X < x) =EXPON.DIST(x, lambda, cumulative)
Since , we can solve for
= 1/8000
P(x < 5000) =EXPON.DIST(5000, 1/8000, true)
= 0.4647
Version 1.0 Fall 2013 5-419
Continuous Probability Distributions
Example 5.33 (continued)
Using the Exponential Distribution
Version 1.0 Fall 2013 5-420
Continuous Probability Distributions
Figure 5.24
Other Useful Continuous
Distributions
Triangular Distribution
Lognormal Distribution
Beta Distribution
Version 1.0 Fall 2013 5-421
Continuous Probability Distributions
Figure 5.25
Example 5.34 Sampling from the Distribution of Dice Outcomes
Version 1.0 Fall 2013 5-422
Random Sampling from Probability Distributions
Probability distribution Intervals for random sampling
Version 1.0 Fall 2013 423
Student Exercise 5.46
Data Set: Computer Repair Time

Problem:

Your Job:
Use Risk Solver Platform to fit a distribution to the data . Try 3 different
statistical measures for evaluating goodness of fit and see if they result in
different best-fitting distributions.


Example 5.34 (continued) Sampling from the Distribution of Dice
Outcomes
=RAND( ) generates random numbers in Excel

Version 1.0 Fall 2013 5-424
Random Sampling from Probability Distributions
Figure 5.26
Outcome = 8 since 0.681 is
between 0.583 and 0.722
Outcome = 4 since 0.119 is
between 0.083 and 0.167
Example 5.35 Using the VLOOKUP Function
Generate a random sample of Changes in DJIA.
First compute F(x)
Assign intervals to outcomes
Generate random numbers using =RAND( )
=VLOOKUP(H2, $E2:$G$10, 3)
Version 1.0 Fall 2013 5-425
Random Sampling from Probability Distributions
Figure 5.27
Example 5.36
Using Excels Random Number Generation Tool
Version 1.0 Fall 2013 5-426
Figure 5.28
Random Sampling from Probability Distributions
Generate 100 outcomes from a Poisson distribution with a mean of 12.
Data
Data Analysis
Random Number Generation
Number of Variables: 1
Number of Random Numbers: 100
Distribution: Poisson
Parameter: Lambda = 12
Example 5.36 (continued) Using Excels Random Number
Generation Tool
Histogram of 100 random outcomes
Version 1.0 Fall 2013 5-427
Figure 5.29
Random Sampling from Probability Distributions
Example 5.38 Using Risk Solver Platform Distribution Functions
An energy company is considering offering a new
product and needs to estimate the growth in PC
ownership. The expected growth rates are:
Minimum = 5%
Most likely = 7.7%
Maximum =10%
Generate 500 samples of PC ownership growth
rate using: =PsiTriangular(5%, 7.7%, 10%)
Version 1.0 Fall 2013 5-428
Random Sampling from Probability Distributions
Example 5.38 (continued) Using Risk Solver Platform
Distribution Functions

Version 1.0 Fall 2013 5-429
Random Sampling from Probability Distributions
Figure 5.32
Version 1.0 Fall 2013
Example 5.39 Analyzing Airline Passenger Data
Sample data on passenger demand for 25 flights
5-430
Data Modeling and Distribution Fitting
Figure 5.33
Can we assume normally distributed?
Example 5.40 Analyzing Airport Service Times
Sample data on service times for 812 passengers at an
airports ticketing counter
Version 1.0 Fall 2013 5-431
Data Modeling and Distribution Fitting
Figure 5.34
Can we assume normally distributed?
Goodness of Fit
The basis for fitting data to a probability
distribution
Attempts to draw conclusions about the nature of
the distribution
Three statistics measure goodness of fit:
Chi-square
Kolmogorov-Smirnov
Anderson-Darling
Version 1.0 Fall 2013 5-432
Data Modeling and Distribution Fitting
Example 5.41
Fitting a Distribution to Airport Service Times
1. Highlight the data
Risk Solver
Tools
Fit
2. Fit Options dialog
Type: Continuous
Test: Kolmorgov-Smirnov

Version 1.0 Fall 2013 5-433
Data Modeling and Distribution Fitting
Figure 5.35
Example 5.41 (continued)
Fitting a Distribution to Airport Service Times
Erlang is the best-fitting distribution
Version 1.0 Fall 2013 5-434
Data Modeling and Distribution Fitting
Figure 5.36
Random Sampling
Learning the Basics of Probability Distributions
Excel provide an RAND function which generates a
random number between 0 and 1.
With the random number we can generate simulations
in Excel or add more power with the Analytics Solver add
in software.
435 Version 1.0 Fall 2013
Applications of Simulations
Learning the Basics of Probability Distributions
Service Levels.
Budgets and Forecasts.
Customer Demand.
Just a few.
Fit tool with Analytics Solver.
436 Version 1.0 Fall 2013
Analytics in Practice: The Value of
Good Data Modeling in Advertising
Grosss model:
A mathematical model that relates the relative
contributions of creative and media dollars to
total advertising effectiveness.
Often used to identify the best number of ads to
purchase.
Analysis found that the optimal number of ads
can vary significantly depending on the shape of
the distribution of effectiveness for a single ad.
Version 1.0 Fall 2013 5-437
Data Modeling and Distribution Fitting
Version 1.0 Fall 2013 5-438
Chapter 5 Key Terms
Bernoulli distribution
Binomial distribution
Complement
Conditional probability
Continuous random
variable
Cumulative distribution
function
Discrete random
variable


Discrete uniform
distribution
Empirical probability
distribution
Event
Expected value
Experiment
Exponential distribution
Goodness of fit
Independent events

Version 1.0 Fall 2013 5-439
Chapter 5 Key Terms (continued)
Multiplication law of
probability
Mutually exclusive
Normal distribution
Outcome
Poisson distribution
Probability
Probability density
function
Probability distribution

Probability mass function
Random number
Random number seed
Random variable
Random variate
Sample space
Standard normal
distribution
Uniform distribution
Union
Learning the Basics of Probability Distributions
Chapter 5: Learning Objectives
1. Use probability rules and formulas to perform
probability calculations.
2. Explain conditional probability.
3. Determine if two events are independent.
4. Apply multiplication law of probability.
5. Explain difference between discrete and continuous
random variables.
6. Define probability distribution.
7. Describe the normal and standard normal
distributions.

440 Version 1.0 Fall 2013
Learning the Basic of Probability Distributions
Chapter 5: Learning Objectives
8. Use the standard normal distribution and z
values to compute normal probabilities.
9. Use Excel Random Number Generator.
10. Fit distributions using Risk Solver Platform.


441 Version 1.0 Fall 2013
442
Chapter 5 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
443
Chapter 5 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
444
Chapter 5 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
Discussion 7: Mid point How are we doing ?
Learn the language.
S
t
a
t
i
s
t
i
c
a
l

N
o
t
a
t
i
o
n

Mean or Average. (Location)
Mu : Population Mean.
X Bar : Sample Mean.
Standard Deviation. (Dispersion)
sigma: population standard deviation.
s small s : sample Standard deviation.
Size or Population or Sample
N for Population and n for sample.
Proportions
for Population proportion and p for sample
proportion.
Summation for summation.

446 Version 1.0 Fall 2013
Details: BAS 120 Blackboard site at
www.waktech.edu
BAS 120 Chapter 6: Sampling and
Modeling
447 Version 1.0 Fall 2013
Topics:
Statistical Sampling.
Est. Population Parameters.
Sampling Error.
Sampling Distributions.
Confidence Intervals.
Sampling Size.
Learning the Basics of Sampling
Sampling and Estimating
Course: BAS 120
Chapter 6
448 Version 1.0 Fall 2013
Sampling Plan
Learning the Basics of Sampling
Part of Plan Comment
Objective What do we want to accomplish.
Target Population What Population Parameters.
Population Frame List from which sample is selected.
Method of Sample Random or others.
Operational
Procedure
Process to collect sample data.
Statistical Tools How to perform analysis.
449 Version 1.0 Fall 2013
Sampling Methods
Learning the Basics of Sampling
Method Comment
Random Use random generator. Excel.
Systematic Select every nth item.
Stratified Divide subset (strata), proportion of each.
Cluster Cluster. Sample cluster. Columbus, OH
Continuous Process Productions Line.
450 Version 1.0 Fall 2013
Topics:
Statistical Sampling.
Est. Population Parameters.
Sampling Error.
Sampling Distributions.
Confidence Intervals.
Sampling Size.
Learning the Basics of Sampling
Sampling and Estimating
Course: BAS 120
Chapter 6
451 Version 1.0 Fall 2013
Estimating Population Parameters
Learning the Basics of Sampling
Population : Parameters
Mean
Standard Deviation

Sample : Statistics
Mean
Standard Deviation s

452 Version 1.0 Fall 2013
Topics:
Statistical Sampling.
Est. Population Parameters.
Sampling Error.
Sampling Distributions.
Confidence Intervals.
Sampling Size.
Learning the Basics of Sampling
Sampling and Estimating
Course: BAS 120
Chapter 6
453 Version 1.0 Fall 2013
Sampling Error
Learning the Basics of Sampling
Statistical Error: sample is only a
subset of the population. Can be
controlled by the size of the
sample.

Non- Statistical Error: Sample is a
poor representation of the
population (bias). Normally
unknown.
454 Version 1.0 Fall 2013
Sampling Error
Learning the Basics of Sampling
Its important to understand the size
of our error. Sampling Distribution.
Candidate 53% +/- 3 %
+/- 3%
Important to know how accurate is
our sample.
455 Version 1.0 Fall 2013
Topics:
Statistical Sampling.
Est. Population Parameters.
Sampling Error.
Sampling Distributions.
Confidence Intervals.
Sampling Size.
Learning the Basics of Sampling
Sampling and Estimating
Course: BAS 120
Chapter 6
456 Version 1.0 Fall 2013
Candidate 53%
+/- 3%
Interval Estimate
Point Estimate vs. Interval Estimate
Learning the Basics of Sampling
Point Estimate
457 Version 1.0 Fall 2013
Central Limit Theorem
Learning the Basics of Sampling
If the sample size is large enough, then the sampling distribution
of the mean is:
- approximately normally distributed regardless
of the distribution of the population
- has a mean equal to the population mean

If the population is normally distributed, then the sampling
distribution is also normally distributed for any sample size.

This theorem is one of the most important practical results in
statistics.
458 Version 1.0 Fall 2013
Standard Error of the Mean
Learning the Basics of Sampling
Note as n increases, our
(standard error of the mean) in
general will get smaller.
In other words or sampling error
is getting smaller.
Our (sampling error variance and
standard deviation) as n increases
is getting smaller : i.e. our
statistical error is getting smaller.
459 Version 1.0 Fall 2013
Confidence Interval
Learning the Basics of Sampling
Interval Estimates
Provide a range for a population characteristic based on a
sample.
A confidence interval of 100(1 )% is an interval [A, B] such
that the probability of falling between A and B is 1 .
1 is called the level of confidence.
90%, 95%, and 99% are common values for 1 .
Confidence intervals provide a way of assessing the accuracy
of a point estimate.


460 Version 1.0 Fall 2013
Confidence Interval
Learning the Basics of Sampling
Confidence Interval: is a range of values between which the
population parameter is believed to be.
461 Version 1.0 Fall 2013
Confidence Interval
Learning the Basics of Sampling
Standard Error of the Mean
How accurate in terms of
a standardize score : Z
Point Estimate
462 Version 1.0 Fall 2013
T Distribution
Learning the Basics of Sampling
T Distribution: Family of probability distributions whose variance
is larger than the normal distribution.
Degrees of Freedom:
Rarely Known
463 Version 1.0 Fall 2013
T Distribution
Learning the Basics of Sampling
Often dont know population Standard
Deviation.
464 Version 1.0 Fall 2013
T Distribution
Learning the Basics of Sampling
465 Version 1.0 Fall 2013
T Distribution
Learning the Basics of Sampling
466 Version 1.0 Fall 2013
Confidence Interval Proportion
Learning the Basics of Sampling
Standard Error of the Mean
How accurate in terms of
a standardize score : Z
Point
Estimate
467 Version 1.0 Fall 2013
Confidence Interval Proportion
Learning the Basics of Sampling
Standard Error of the Mean for a
proportion.
Whos outcomes is more predictable (i.e. smaller variance),
a person who is a 90% foul shooter or a person who is a 50%
foul shooter?
Note: .9 * (1-.9) = .09 vs. .5 * .5 = .25
468 Version 1.0 Fall 2013
Topics:
Statistical Sampling.
Est. Population Parameters.
Sampling Error.
Sampling Distributions.
Confidence Intervals.
Sampling Size.
Learning the Basics of Sampling
Sampling and Estimating
Course: BAS 120
Chapter 6
469 Version 1.0 Fall 2013
CI and Sample Size
Learning the Basics of Sampling
Some algebra we can solve in all the Confidence Interval
equations for n our sample size.
470 Version 1.0 Fall 2013
Topics:
Statistical Sampling.
Est. Population Parameters.
Sampling Error.
Sampling Distributions.
Confidence Intervals.
Sampling Size.
Learning the Basics of Sampling
Sampling and Estimating
Course: BAS 120
Chapter 6
471 Version 1.0 Fall 2013
Sampling Distribution
Learning the Basics of Sampling
Lets assume the average
height of men is 510.

How likely would it be for
us to sample 1 person and
find they have a ht. of 62.

How likely would it be for
us to sample 64 men and
the average ht. of those 64
men was 62
Sampling Distribution
472 Version 1.0 Fall 2013
Sampling Distribution
Learning the Basics of Sampling
What is going to happen to the degree of variance around our
mean as the sample size increases?

For example: If we wanted to know the average GPA of students
at a college think about the range for each in the table below.
Sample Size Sample Mean Level Of Variance Around Mean
1 Student 3.5 GPA Large Level : Confident Mean 2.5 and 4.0
20 Students 3.2 GPA A Little Less Large: Confident Mean 2.7 and 3.7
200
Students
3.0 GPA Small Level: Confident Mean 2.8 and 3.2
473 Version 1.0 Fall 2013
Learn the language.
S
t
a
t
i
s
t
i
c
a
l

N
o
t
a
t
i
o
n

Mean or Average. (Location)
Mu : Population Mean.
X Bar : Sample Mean.
Standard Deviation. (Dispersion)
sigma: population standard deviation.
s small s : sample Standard deviation.
Size or Population or Sample
N for Population and n for sample.
Proportions
for Population proportion and p for sample
proportion.
Summation for summation.

474 Version 1.0 Fall 2013
Sample Statistics
Learning the Basics of Sampling
475 Version 1.0 Fall 2013
Estimating Population Parameters
Learning the Basics of Sampling
476 Version 1.0 Fall 2013
Population Parameters
Learning the Basics of Sampling
P
o
p
u
l
a
t
i
o
n

477 Version 1.0 Fall 2013
Discussion 8: Data Cleansing
Example 6.1 (continued)
A Sampling Plan for a Market Research Study
Objective - estimate the proportion of golfers who would join
the program
Target population - golfers over 25 years old
Population frame - golfers who purchased equipment at
particular stores
Operational procedures - e-mail link to survey or direct-mail
questionnaire
Statistical tools - PivotTables to summarize data by
demographic groups and estimate likelihood of joining the
program


Version 1.0 Fall 2013
Statistical Sampling
6-479
Example 6.2 Simple Random Sampling with Excel
Sample from the
Excel database
Sales Transactions
Data
Data Analysis
Sampling

Periodic selects every n
th
number
Random selects a simple random sample
Version 1.0 Fall 2013
Statistical Sampling
Figure 6.1
6-480
Example 6.2 (continued)
Simple Random Sampling with
Excel
Samples generated by Excel
Sorted by customer ID
Sampling is done with
replacement so duplicates may
occur.
Version 1.0 Fall 2013
Statistical Sampling
Figure 6.2
6-481
Analytics in Practice: Using Sampling
Techniques to Improve Distribution
MillerCoors brewery wanted to better
understand distributor performance
Defined 7 attributes of proper distribution
Collected data from distributors using stratified
sampling based on market share
Developed performance rankings of distributors
and identified opportunities for improvement
Version 1.0 Fall 2013
Statistical Sampling
6-482
Example 6.3 A Sampling Experiment
A population is uniformly distributed between 0
and 10.
Mean = (0 + 10)/2 = 5
Variance = (10 0)
2
/12 = 8.333
Use Excel to generate 25 samples of size 10 from
this population. Compute the mean of each.
Prepare a histogram of the 25 sample means.
Prepare a histogram of the 250 observations.
Version 1.0 Fall 2013
Sampling Error
6-483
Example 6.3 (continued) A Sampling Experiment

Version 1.0 Fall 2013
Sampling Error
Figure 6.3
6-484
Example 6.3 (continued) A Sampling Experiment
Repeat the sampling experiment for samples of size 25, 100,
and 500
Version 1.0 Fall 2013
Sampling Error
Table 6.1
6-485
Example 6.3 (continued) A Sampling
Experiment

Version 1.0 Fall 2013
Sampling Error
Figure 6.4
6-486
Example 6.4 Estimating Sampling Error Using the
Empirical Rules
Using the empirical rule for 3 standard deviations
away from the mean, ~99.7% of sample means
should be between:
[2.55, 7.45] for n = 10
[3.65, 6.35] for n = 25
[4.09, 5.91] for n = 100
[4.76, 5.24] for n = 500

Version 1.0 Fall 2013
Sampling Error
6-487
Table 6.1
Example 6.5
Computing the Standard Error of the Mean
For the uniformly distributed population, we found
2
= 8.333
and, therefore, = 2.89
Compute the standard error of the mean for sample sizes of
10, 25, 100, 500.
Version 1.0 Fall 2013
Sampling Distributions
6-488
For comparison from Table 6.1
Version 1.0 Fall 2013 489
Student Exercise 6.2
Data Set: Credit Risk Data

Problem:

Your Job:
The bank want to sample from this database to conduct a more-detailed audit.
Use Excel Sampling tool to find a simple random sample of 20 unique records.


Central Limit Theorem
If the sample size is large enough, then the
sampling distribution of the mean is:
- approximately normally distributed regardless
of the distribution of the population
- has a mean equal to the population mean
If the population is normally distributed, then the
sampling distribution is also normally distributed
for any sample size.
This theorem is one of the most important
practical results in statistics.
Version 1.0 Fall 2013
Sampling Distributions
6-490
Example 6.6
Using the Standard Error in Probability Calculations
The purchase order amounts for books on a
publishers Web site is normally distributed with
a mean of $36 and a standard deviation of $8.
Find the probability that:
a) someones purchase amount exceeds $40
b) the mean purchase amount for 16 customers
exceeds $40
Version 1.0 Fall 2013
Sampling Distributions
6-491
Example 6.6 (continued)
Using the Standard Error in Probability
Calculations
a) P(x > 40) = 1 NORM.DIST(40, 36, 8, true)
= 0.3085
b) P(x > 40) = 1 NORM.DIST(40, 36, 2, true)
= 0.0228
Version 1.0 Fall 2013
Sampling Distributions
6-492

Example 6.7 Interval Estimates in the News
A Gallup poll might report that 56% of voters
support a certain candidate with a margin of
error of 3%.
We would have a lot of confidence that the
candidate would win.

If, instead, the poll reported a 52% level of
support with a 4% margin of error, we would be
less confident in predicting a win for the
candidate.
Version 1.0 Fall 2013
Interval Estimates
6-493
Example 6.9 Computing a Confidence Interval with an Unknown
Standard Deviation
A large bank has sample data used in making credit decisions.
Give a 95% confidence interval estimate of the mean
revolving balance of homeowner applicants.
Version 1.0 Fall 2013
Confidence Intervals
Figure 6.7
6-494
Example 6.9 (continued) Computing a Confidence Interval with
an Unknown Standard Deviation

= Mean T.INV(confidence level, df)*s/SQRT(n)
= Mean CONFIDENCE.T(alpha, stdev, size)
Version 1.0 Fall 2013
Confidence Intervals
6-495
Figure 6.8
Example 6.10 Computing a Confidence Interval for a Proportion
(of those willing to pay a lower health insurance premium for a
lower deductible)

Version 1.0 Fall 2013
Confidence Intervals
Figure 6.9
6-496
Version 1.0 Fall 2013 497
Student Exercise 6.7
Data Set: None

Problem:

Your Job:
A popular soft drink is sold in 2 liter bottles. Because of variation in the filling
process, bottles have a mean of 2,000 milliliters and a standard deviation of 20,
normally distributed. Note: 2 liters = 2,000 milliliters.

If the manufacturer samples 100 bottles what is the probability that the mean
is less than 1950 milliliters.

What mean overfill or more will occur only 10% of the time for the sample of
100 bottles.


Example 6.10 (continued) Computing a 95% Confidence Interval
for a Proportion

Sample proportion NORM.S.INV((alpha/2)*
(standard error of the sample proportion))

Version 1.0 Fall 2013
Confidence Intervals
Figure 6.10
6-498
Example 6.11 Drawing a Conclusion about a Population
Mean Using a Confidence Interval
In Example 6.8 we obtained a confidence interval for the
bottle-filling process as [790.12, 801.88]
The required volume is 800 and the sample mean is 796
mls.
Should machine
adjustments be
made?

Version 1.0 Fall 2013
Using Confidence Intervals
for Decision Making
Figure 6.5
6-499
Version 1.0 Fall 2013 500
Student Exercise 6.9
Data Set: Credit Risk Data

Problem:

Your Job:
Find the standard deviation of the total assets held by the bank.

Treating the records in the database as a population, use your sample in
problem 6.2 (see below) and compute 90%, 95%, and 99% confidence intervals
for the total assets held in the bank by loan applicants .

How does your confidence interval differ if you assume that the population
standard deviation is not known but estimated using your sample data.

Your Job: From 6.2
The bank want to sample from this database to conduct a more-detailed audit.
Use Excel Sampling tool to find a simple random sample of 20 unique records.
Version 1.0 Fall 2013
Using Confidence Intervals
for Decision Making
6-501
Example 6.13 Computing a Prediction Interval
Compute a 95% prediction interval for the revolving balances of
customers (Credit Approval Decisions)
Version 1.0 Fall 2013
Confidence Intervals and Sample Size
From Example 6.9
6-502
Prediction interval width = 22,585 Confidence interval width = 4,267
Example 6.14
Sample Size Determination for the Mean
In the liquid detergent example, the margin of error was 2.985
mls.
What is sample size is needed to reduce the margin of error to
at most 3 mls?
Version 1.0 Fall 2013
Confidence Intervals and Sample Size
Figure 6.11
6-503
Round up to
97 samples.
Version 1.0 Fall 2013
Confidence Intervals and Sample Size
6-504
Version 1.0 Fall 2013 505
Student Exercise 6.17
Data Set: None

Problem:

Your Job:
A marketing study found that the mean spending in 15 categories of consumer
items for 297 respondents in the 18-34 age group was $71.86 with a standard
deviation of $70.90. For 736 respondents in the 35 plus age group the mean
and SD were $61.35 and $45.29. Develop 95% confidence intervals for the
mean spending amounts for each age group. What conclusions can you draw?
Version 1.0 Fall 2013 506
Student Exercise 6.20
Data Set: Restaurant Sales

Problem:

Your Job:
Data for lunch, dinner and delivery sales for the local Italian restaurant.
Develop a 95% prediction intervals for dollar sales of each of these variables for
next Saturday.
Version 1.0 Fall 2013 507
Student Exercise 6.20
Data Set: Restaurant Sales

Problem:

Your Job:
Data for lunch, dinner and delivery sales for the local Italian restaurant.
Develop a 95% prediction intervals for dollar sales of each of these variables for
next Saturday.
Version 1.0 Fall 2013 6-508
Chapter 6 - Key Terms
Central limit theorem
Cluster sampling
Confidence interval
Convenience sampling
Degrees of freedom
Estimation
Estimators
Interval estimate
Judgment sampling
Level of confidence
Nonsampling error
Point Estimate
Population frame
Prediction interval
Probability interval
Sample proportion
Sampling (statistical)
error
Sampling distribution of
the mean
Sampling plan
Version 1.0 Fall 2013 6-509
Chapter 6 - Key Terms (continued)
Simple random sampling
Standard error of the mean
Stratified sampling
Systematic (or periodic) sampling
t-distribution

Learning the Basics of Sampling
Chapter 6: Learning Objectives
1. Describe the elements of a sampling plan.
2. Explain in importance of unbiased estimators.
3. Define the sampling distribution of the mean.
4. Calculate the standard error of the mean.
5. Explain the practical importance of the central limit
theorem
6. Define and give examples of confidence interval.
7. Describe the difference between the t-distribution
and the normal distribution.
510
Version 1.0 Fall 2013
Learning the Basic of Sampling
Chapter 6: Learning Objectives
8. Compute sample size needed to ensure a
confidence interval for means and
proportions with a specified margin of error.


511 Version 1.0 Fall 2013
512
Chapter 6 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
513
Chapter 6 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
514
Chapter 6 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
Details: BAS 120 Blackboard site at
www.waktech.edu
BAS 120 Chapter 7: Statistical
Inference
515 Version 1.0 Fall 2013
Discussion 9: How Big Data Helps Human Resources
Statistical Inference
Course: BAS 120
Chapter 7
Topics
Hypothesis Testing.
Understanding Risk.
Selecting Test Statistic.
Drawing Conclusions.
P Value.
Proportions.
Two Sample Testing.
Analysis of Variance ANOVA
Chi-Square Test for
Independence.
517 Version 1.0 Fall 2013
Hypothesis Testing

Similar to Court of Law.
Note:
Similar to law, are conclusions are
either to reject the null hypothesis
(sample data provides sufficient
statistical evidence to support the
alternative hypothesis) or fail to
reject the null hypothesis (show that
sample data does not support the
alternative hypothesis.
518 Version 1.0 Fall 2013
Hypothesis Testing

Similar to Court of Law.
Procedure or Steps.
1. What are we trying to proved? What
are the charges?
2. How sure do you want to be?
3. What does the data or evidence
indicate?
4. Is the data or evidence compelling?
5. Does the evidence overcome
reasonable doubt?
6. What do we conclusion or determine
verdict?
519 Version 1.0 Fall 2013
Hypothesis Testing

Similar to Court of Law.
Procedure or Steps.
1. Identify population parameter of
interest and formulate hypothesis to
test. Null Hypothesis (current state).
2. Select a significance level or
confidence level. reasonable doubt.
3. Collect data and calculate test
statistic.
4. Determine critical value (based upon
your level of confidence).
5. Compare test statistic to critical value.
6. Interpret the results.
520 Version 1.0 Fall 2013
Why it may seem complicated?
Same theme multiple applications.
Various Scenarios.
1. Comparing a sample to a population.
2. Are we comparing two samples.
3. Are we working with categorical vs.
numerical data.
4. Are we working with multiple samples.
5. Good news: Applications are
significant.

521 Version 1.0 Fall 2013
Brand Sentiment
Higher NPS
:-)
Predictive Maintenance
Less Downtime
Network Optimization
Lower Cost
Propensity to Churn
Greater Retention
Real-time Demand/
Supply Forecast
More Efficient
360
O
Customer View
Loyal Customers
WHAT IF YOU COULD TURN NEW SIGNALS
INTO BUSINESS VALUE?
Asset Tracking
Increase Productivity
Personalized Care
Loyal Customers
Product Recommendation
More Sales
Risk Mitigation, Real-time
Retain Market Value
Insider Threats
Greater Security
Fraud Detection
Lower Risk
Discussion 10: Information is King
Statistical Inference
Course: BAS 120
Chapter 7
Topics
Hypothesis Testing.
Understanding Risk.
Selecting Test Statistic.
Drawing Conclusions.
P Value.
Proportions.
Two Sample Testing.
Analysis of Variance ANOVA
Chi-Square Test for
Independence.
524 Version 1.0 Fall 2013
Hypothesis Testing: Step 1

Similar to Court of Law.
Step 1 : State Null and Alternative Hypothesis.

Null Hypothesis : Ho Alternative : H1

Ho : The average test = 80 H1: The average test =/ 80.
Note: two sided test, the = always goes with null hypo.

Ho : The average test >= 80 H1: The average test < 80.
Note: one sided test.

Ho: The average test <=80 H1: The average test >80.
Note: one sided test.
525 Version 1.0 Fall 2013
Hypothesis Testing: Step 1

One Sided or Two Sided
Two Sided One Sided
526 Version 1.0 Fall 2013
Hypothesis Testing: Step 2

How much type 1 error will you accept.
Step 2 : Decide the significance level. (how much risk are you
willing to live with: how much reasonable doubt)

Select an Alpha: normally .01, .05.

1- Alpha : Confidence Coefficient. .99 or .95

Often you will see shown as percents 99% or 95%.
527 Version 1.0 Fall 2013
Hypothesis Testing: Step 3

The test statistic will depend upon your distribution.
Step 3 : Compute the value of the test statistic.
Need to calculate test statistic based upon sample data.
528 Version 1.0 Fall 2013
Hypothesis Testing: Step 4

The critical values will depend upon your distribution and
alpha.
Step 4 : Determine Critical Value(s) or the P Value.
Find critical values based upon alpha.
529 Version 1.0 Fall 2013
Hypothesis Testing: Step 4

The critical values will depend upon your distribution and
alpha.
Step 4 : Determine Critical Value(s) or the P Value.
P Value: The probability of obtaining a test statistic value
equal to or more extreme than that obtained from the
sample data when the null hypothesis is true.
530 Version 1.0 Fall 2013
Hypothesis Testing: Step 4

The critical values will depend upon your distribution and
alpha.
Step 4 : Determine Critical Value(s) or the P Value.
Note: for two sided test we use alpha /2.
531 Version 1.0 Fall 2013
Hypothesis Testing: Step 5

If Test Statistic is outside critical value reject null hypothesis.
Step 5 : Compare Test Statistics to Critical Value(s)
If test statistic is
in the rejection
region : reject null
hypothesis.
532 Version 1.0 Fall 2013
Hypothesis Testing: Step 5

If Test Statistic is outside critical value reject null hypothesis.
Step 6: Interpret the Results
If test statistic is in the rejection
region : reject null hypothesis.

Note we have not proven the
null hypothesis true we have only
proven that we cannot reject
533 Version 1.0 Fall 2013
Hypothesis Testing: Review

Similar to Court of Law.
534 Version 1.0 Fall 2013
Version 1.0 Fall 2013 535
Student Exercise 7.3
Data Set: Airport Service Times

Problem:

Your Job:
Determine if the airline can claim that the average service time is less than 2
minutes.
Version 1.0 Fall 2013 536
Student Exercise 7.6
Data Set: Colleges and Universities

Problem:

Your Job:
Formulate and test a hypothesis to determine if statistical evidence (at 95%)
that suggests that the graduation rate for either top liberal arts colleges or
research institutions in the sample exceed 90%? Do the data support a
conclusion that the graduation rate exceed 85%? Would the conclusion
change if the level of significance were .01 instead of .05 ?
Version 1.0 Fall 2013 537
Student Exercise 7.8
Data Set: Room Inspection

Problem:

Your Job:
Data for 100 room inspections at 25 hotels in a major chain. Management
would like the proportion of nonconforming rooms to be less than 2%. Test an
appropriate hypothesis to determine if management can make this claim.
Version 1.0 Fall 2013 538
Student Exercise 7.10
Data Set: Consumer Transportation Survey

Problem: Test the following null hypotheses:

Your Job:
Individuals spend at least 10 hours per week in their vehicles.
Individuals drive an average of 450 miles per week.
The average age of SUV drivers is no greater than 35.
At least 75% of individuals are satisfied with their vehicles.
Version 1.0 Fall 2013 539
Student Exercise 7.15
Data Set: Credit Risk Data

Problem: Test the following null hypotheses:

Your Job:
Number of months employed is the same for applicants with low credit risk as
those with high credit evaluations.
Statistical Inference
Course: BAS 120
Chapter 7
Topics
Hypothesis Testing.
Understanding Risk.
Selecting Test Statistic.
Drawing Conclusions.
P Value.
Proportions.
Two Sample Testing.
Analysis of Variance ANOVA
Chi-Square Test for
Independence.
540 Version 1.0 Fall 2013
Hypothesis Testing: Risks

Similar to Court of Law.
As in court sometime the
jury gets it wrong.

May let a guilty person go
free.

May convict a truly
innocent person.
541 Version 1.0 Fall 2013
Hypothesis Testing: Risks

Similar to Court of Law.
Innocent (True) Guilty (False)
Fail to Reject Ho
(Find Innocent)

Correct Type II error.
Let guilty person
go free.
Reject Ho (Find
Guilty)

Type 1 error.
Innocent person
found guilty.
Correct
O
u
t
c
o
m
e
s
Actual Truth
542 Version 1.0 Fall 2013
Statistical Inference
Course: BAS 120
Chapter 7
Topics
Hypothesis Testing.
Understanding Risk.
Selecting Test Statistic.
Drawing Conclusions.
P Value.
Proportions.
Two Sample Testing.
Analysis of Variance ANOVA
Chi-Square Test for
Independence.
543 Version 1.0 Fall 2013
Hypothesis Testing: Test Statistic

Test Statistic.
Type or Test Test Statistic
One Sample test for mean,
when known.
One Sample test for mean,
when is unknown.

544 Version 1.0 Fall 2013
Statistical Inference
Course: BAS 120
Chapter 7
Topics
Hypothesis Testing.
Understanding Risk.
Selecting Test Statistic.
Drawing Conclusions.
P Value.
Proportions.
Two Sample Testing.
Analysis of Variance ANOVA
Chi-Square Test for
Independence.
545 Version 1.0 Fall 2013
Hypothesis Testing: Step 5

If Test Statistic is outside critical value reject null hypothesis.
Step 5 : Compare Test Statistics to Critical Value(s)
If test statistic is
in the rejection
region : reject null
hypothesis.
546 Version 1.0 Fall 2013
Hypothesis Testing: Step 6

If Test Statistic is outside critical value reject null hypothesis.
Step 6 : Interpret the Results
If test statistic is in the rejection
region : reject null hypothesis.

Note : if in the on rejection
region, we do not reject. Note we
have not proven the null
hypothesis true we have only
proven that we cannot reject
547 Version 1.0 Fall 2013
Version 1.0 Fall 2013 548
Student Exercise 7.23
Data Set: Cell Phone Survey

Problem:

Your Job:
Apply ANOVA to determine if the mean response for Value for the Dollar is
the same for different types of cell phones.
Statistical Inference
Course: BAS 120
Chapter 7
Topics
Hypothesis Testing.
Understanding Risk.
Selecting Test Statistic.
Drawing Conclusions.
P Value.
Proportions.
Two Sample Testing.
Analysis of Variance ANOVA
Chi-Square Test for
Independence.
549 Version 1.0 Fall 2013
P Value

P Value is often used.
P Value: The probability of obtaining a test statistic value equal to
or more extreme than that obtained from the sample data when
the null hypothesis is true.
550 Version 1.0 Fall 2013
Statistical Inference
Course: BAS 120
Chapter 7
Topics
Hypothesis Testing.
Understanding Risk.
Selecting Test Statistic.
Drawing Conclusions.
P Value.
Proportions.
Two Sample Testing.
Analysis of Variance ANOVA
Chi-Square Test for
Independence.
551 Version 1.0 Fall 2013
Hypothesis Testing: Test Statistic

Test Statistic for One Sample Proportion.
Type or Test Test Statistic
One Sample test for a
proportion.
552 Version 1.0 Fall 2013
Normal Distribution T Distribution Proportion
When Population Mean and
Standard Deviations are
Know. (Rare in real life)
Known or assumed
Population Mean but
Pop. SD is unknown.
Us Sample SD (s) as
sub.
Binary results: Pass / Fail
or Yes / No or
Made/Missed.
(Independent/Large
Sample)
Test
Statistic
What is it?
Determines a standard
score that measures the
distance from the mean
our sample was.
Test
Statistic
How to cal?
Excel
Function
What is it?
Critical Values:
NORM.INV
P-Value : NORM.DIST
Critical Values: T.INV
P Value: T.DIST

Critical Values:
NORM.S.INV
P-Value :
NORM.S.DIST

553 Version 1.0 Fall 2013
So FAR?

Using Samples to Make Inferences About Population.
We have discussed how to compare a sample to a population.
554 Version 1.0 Fall 2013
But what if we want to compare two samples?

Using Samples to Make Inferences About Population.
For Example: Sales of two shoe styles
555 Version 1.0 Fall 2013
Statistical Inference
Course: BAS 120
Chapter 7
Topics
Hypothesis Testing.
Understanding Risk.
Selecting Test Statistic.
Drawing Conclusions.
P Value.
Proportions.
Two Sample Testing.
Analysis of Variance ANOVA
Chi-Square Test for
Independence.
556 Version 1.0 Fall 2013
What if we have two samples?
Differences between sample means.
What if we want to compare more than one sample against
another sample to determine if a statistical significant exists..
557 Version 1.0 Fall 2013
Performance of two Suppliers?
Differences between samples.
Same Steps : Utilize excel T test under Data Analysis.
558 Version 1.0 Fall 2013
Version 1.0 Fall 2013 559
Student Exercise 7.15
Data Set: Credit Risk Data

Problem: Test the following null hypotheses:

Your Job:
Number of months employed is the same for applicants with low credit risk as
those with high credit evaluations.
But what if we want to compare several samples?

If we use T test or type 1 error will increase: ANVOA.
For Example: Sales from 5 regions
560 Version 1.0 Fall 2013
Statistical Inference
Course: BAS 120
Chapter 7
Topics
Hypothesis Testing.
Understanding Risk.
Selecting Test Statistic.
Drawing Conclusions.
P Value.
Proportions.
Two Sample Testing.
Analysis of Variance ANOVA
Chi-Square Test for
Independence.
561 Version 1.0 Fall 2013
Comparing multiple samples?
Differences between multiple samples.
What if we want to compare multiple samples to determine if
statistical differences exist.
562 Version 1.0 Fall 2013
Let take a visual look at ANOVA?
Differences between samples.
http://web.utah.edu/stat/introstats/anovaflash.html
563 Version 1.0 Fall 2013
Performance of multiple Suppliers?
Differences between multiple samples.
Same Steps : Utilize excel ANOVA under Data Analysis.
564 Version 1.0 Fall 2013
Version 1.0 Fall 2013 565
Student Exercise 7.26
Data Set: Graduate School Survey

Problem:

Your Job:
Perform a chi-square test for independence to determine if plans to attend
graduate school are independent of gender.
But what if we want to compare several samples which are
proportions?

Can test for independence using CHI Square Test
For Example: Brand Preferences between various groups?
566 Version 1.0 Fall 2013
Version 1.0 Fall 2013 567
Student Exercise 7.24
Data Set: Freshman College Data

Problem:

Your Job:
Using the data use ANOVA to determine whether the mean retention rate is
the same for all colleges over the four-year period. Second use ANOVA to
determine if the mean ACT and SAT scores are the same each year over all
colleges.
Alternative hypothesis
Analysis of variance (ANOVA)
Chi-square distribution
Chi-square statistic
Confidence coefficient
Factor
Hypothesis
Level of significance
Null hypothesis

Version 1.0 Fall 2013 7-568
Chapter 7 - Key Terms
Version 1.0 Fall 2013 7-569
Chapter 7 - Key Terms (continued)
One-sample hypothesis tests
One-tailed test of hypothesis
p-value (observed significance level)
Power of the test
Statistical inference
Two-tailed tests of hypothesis
Type I error
Type II error
Learning the Basics of Statistical Inference
Chapter 7: Learning Objectives
1. Explain purpose of hypothesis testing.
2. Explain the difference between Type I and Type II errors.
3. Show how to increase the power of a test.
4. Use p-value to draw conclusions about hypothesis test.
5. Explain the purpose of analysis of variance.
6. Use the Excel ANOVA tool.
7. Conduct and interpret the results of a chi-square test for
independence.

570 Version 1.0 Fall 2013
Discussion 11: Tax Agencies
Discussion 12: Career Discussion
Discussion 13: Analytics for Insurance
Discussion 14: Help us Improve this course
Details: BAS 120 Blackboard site at
www.waktech.edu
A1: Boot Camps and Reference
575 Version 1.0 Fall 2013
Key Sources
Blackboard Video Links also favorites on my
Youtube Channel:
Excel : Series of Videos (ExcelForFun)
Stats: Series of Videos (JBStats)
Math Basics: Series of Videos (MathTV)
576 Version 1.0 Fall 2013
Detailed by Chapter
A2: Past Final Exam Questions
577 Version 1.0 Fall 2013
Chapter 1 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013 578
Chapter 1 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013 579
580
Chapter 1 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
Chapter 2 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013 581
Chapter 2 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013 582
Chapter 2 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013 583
Chapter 2 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013 584
585
Chapter 3 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
586
Chapter 3 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
587
Chapter 4 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
588
Chapter 4 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
589
Chapter 4 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
590
Chapter 5 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
591
Chapter 5 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
592
Chapter 5 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
593
Chapter 6 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
594
Chapter 6 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
595
Chapter 6 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
596
Chapter 7 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
597
Chapter 7 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013
598
Chapter 7 Questions from last semesters Mid-Term or Final Exam
Version 1.0 Fall 2013

You might also like