Professional Documents
Culture Documents
17
Fundamentals of Data Science
Fall 2017
Daniel Egger
Incomplete Information
Business Goals
From the point of view of business, better decisions are those that:
(1) Increase Revenues;
(2) Improve Profitability (by reducing costs of delivering goods or services,
or otherwise increasing efficiency); and
(3) Reduce Risk.
In Bayesian Logical Data Analysis (our approach in this course) models are
generally represented as Probability Distributions.
1
When a model is a probability distribution, updating the distribution based on new
data is done using Bayes Theorem.
A completely automated system, where new data inputs continually update outputs,
for example, what offers an online customer sees, or what buy and sell orders an
algorithmic trading system issues, is often called a data pipeline. It is an
engineered system.
Data Scientists do Projects manually, by executing steps (1)-5) above; they also
design and build Pipelines, where all the steps (1)-(8) occur automatically
2
Linear Regression Models are Probabilistic
A. Review of basic probability theory concepts, up to and including how to use new
data and Bayes Theorem to update a probability distribution the model for all
machine learning.
B. Information Measures Shannon entropy and how it can be used both as a metric
to quantify reduction in uncertainty (information gain) provided by a model, and to
compare alternative models to determine which is most effective.
E. Insights into study design including the most common errors that cause most
research not to be reproducible, and how to avoid them.
F. A realistic practice project evaluating Credit Card applications that uses both:
Binary Classification, to select applicants based on forecasting default or no
default; and Linear Regression, to select applicants based on forecasting
future profitability.
3
Lecture 1, Part Two:
Probability
Degree of belief in the truth of a statement.
[Definition from Bayesian Logical Data Analysis see Cox Axioms, McKay p 26]
All other statements have some degree of uncertainty and are assigned probabilities
that are real numbers greater than 0 and less than 1.
Notation 0 < p(x) < 1.
Negation
Probability Distribution
Exclusive means no more than one statement can be true with certainty, given
complete information.
Exhaustive means at least one statement must be true with certainty, given
complete information.
4
Given complete information, exactly one statement in a probability distribution is
true with certainty and the others are false with certainty.
X = {1 , 2 , 3 , , } Y = {1 , 2 , 3 , , }
Principle of Indifference
It follows from the principle of indifference that, in the absence of any distinguishing
information, the probability of a particular outcome =
(The number of events that meet the relevant definition for that outcome) / (the
total number of possible events).
A Universe, (also called a Sample Space) is the set of all possible outcomes.
It follows from the definitions of probability and the Principle of Indifference that in
the absence of additional information, the probability of any outcome is the number
of events it contains, divided by the total number of events in the universe.
For example: when tossing a fair, six-sided die, the outcome the result is even
contains three different events: 2, 4, and 6. Because by the principle of indifference
each event must have probability 1/6, the probability of the outcome the result is
even is 3/6 = .
5
Joint Probabilities
A joint probability p(1 , 1 ) is the probability that both outcome 1 from probability
distribution X AND outcome 1 from probability distribution Y are True.
The joint probability distribution, written (X,Y) is the collection of all possible joint
distributions of outcomes from X and outcomes from Y. Note that if X has n
outcomes, and Y has m outcomes, then the join distribution (X,Y) is a new
probability distribution with n*m outcomes.
For example: if I toss a six-sided die once, and flip a coin (Heads/Tails) once, the
joint distribution would contain 12 outcomes, to which probabilities would be
assigned:
1 and Heads 2 and Heads 3 and Heads 4 and Heads 5 and Heads 6 and Heads
1 and Tails 2 and Tails 3 and Tails 4 and Tails 5 and Tails 6 and Tails
Probability of a joint distribution of two outcomes, one from X and one from Y.
P(X,Y) = p(Y,X) .
Independence
If a joint distribution of X and Y, P(X,Y), does not equal the product distribution
p(X)p(Y), then X and Y are dependent.
For example: If over many years the expected number of hot (100 F, 37.78 C) days in
Durham is 40 per year, or 10.96%, and the expected number of rainy days is 75, or
20.55%, if hot days and rainy days are independent, the expected probability of hot-
and-rainy days is (10.96%)(20.55%) or 2.25% - so the average number of hot-and-
rainy days per year would be 8.2.
6
Venn Diagrams Are Sometimes Used to Represent Probabilities
The area of the rectangle represents the Universe of all possible outcomes.
It has area 1.
The area of the circles represent p(A) and p(B).
Intersection
P(A AND B)
The Joint Probability p(1 , 1 ) can be represented as the intersection of the two
probabilities p(1 ) and (1).
7
Union
P(A OR B)
Example: Probability that a fair six-sided die will come up 3 OR a fair coin will
come up heads [or both] = (1/6) + (1/2) ((1/6)(1/2))
= 2/12 + 6/12 1/12
= 7/12.
Example: The probability that a fair coin will come up heads at least once in two
tosses is (1/2) + (1/2) (1/4) = .
8
Complement
P(~A)
P(~A) = 1 p(A).
1) 2 Heads
2) 1 Head, 1 Tail (2 separate events in this Outcome)
3) 2 Tails
9
Drawing With Replacement versus Without Replacement
Examples
When choosing one of 1000 three-characters strings between 000 and 999 at
random, the digits {0, 1, 2, .9} are each used with replacement they can occur
more than once.
When drawing cards from a deck to make a five-card poker hand, the cards are
drawn without replacement if you draw the Ace of Clubs on the first card, there is
no chance of drawing the Ace of Clubs on subsequent draws, because it has already
been removed from the deck. So, when calculating the probability of drawing an Ace
on the second draw after drawing an Ace on the first draw, the probability is 3/51
(number of remaining Aces in the deck) / (number of remaining cards in the deck).
Urns
Urns are often used in school probability problems. They are imaginary containers
where the contents consist of black and white marbles that cannot be seen and may
be drawn out one at a time with equal probability of drawing any that remain in the
container.
Assume an urn contains exactly three marbles, two white and one black. The
probability of drawing two white marbles in a row with replacement is (2/3)(2/3) =
4/9.
The probability of drawing two white marbles in a row without replacement is not
the same. The two draws are now dependent because the outcome of the first
changes the probabilities of white and black the second.
On the first draw the probabilities are: the same: p(white) = 2/3, p(black) = 1/3.
However, if you draw white first, the changed probabilities on the second draw are:
p(white) = 1/2, p(black) = 1/2.
So the probability of white on the first, and on the second, is (2/3)*(1/2) = 1/3.
10
For example, to calculate the probability of drawing exactly 2 black marbles in 20
draws, from a population of 100 marbles containing 10 black marbles using Excel,
use the function HYPGEOM.DIST(2, 20, 10, 100, false) = 31.8%.
Selecting an item from a set of m possibilities n times with replacement when the
order matters (permutation) can happen in ways.
For example:
Drawing from the set {Heads, Tails} two times has 22 = 4 permutations.
Drawing from the set {0,1,2,9} three times has 103 = 1000 permutations.
Drawing from a 52-card deck five times with replacement has (52)5
(380,204,032) permutations.
To calculate how many ways can we draw from a set of m possibilities n times
without replacement when the order matters (permutation) we use factorial
notation.
Permutations
[What does this mean? Factorial notation writes 5*4*3*2*1 = 120 as 5!, read five
factorial, 6*5*4*3*2*1 = 720 as 6!, read six factorial, and so on. To write a
product such as 7*6 we can write 7!/5! Because this is equal to
7*6*5*4*3*2*1/5*4*3*2*1 and all terms but 7*6 cancel out.
In addition, by convention, we set 0! = 1.
Some examples:
To draw 5 cards from a 52-card deck without replacement when the order
matters there are 52!/(52-5)! = 311,875,200 permutations.
11
Combinations
Often we dont care about the order, but only about the resulting group, or
combination. The number of unique combinations when drawing from a set of m
things n times without replacement is
m!/(m n)!(n!).
This formula is used so frequently in probability that it has its own name, m
choose n, and special notation, ( ).
The number of unique sets when drawing from the Urn (or set) {black, white} twice
2
without replacement = ( ) = 2!/(0!2!) = 1 combination.
2
The number of unique groups of numbers when drawing from the set {0,1,2,.9} 3
10
times without replacement is ( ) = 10!/(7!3!) = 120 combinations.
3
The number of unique five-card poker hands when drawing from a 52-card deck
52
without replacement is ( ) = 52!/(47!5!) = 2,598,960 combinations. Note that this
5
is the number of permutations, 311,875,200, divided by the ways five cards can be
ordered, which is 120.
Decks of cards contain four suits (hearts, diamonds, clubs, spades) of 13 cards each.
Suppose we want to know the probability of being dealt a flush (5 cards of the same
suit). We dont care about the order the cards are dealt, so this is a combination
problem.
The probability will be the ratio of combinations that are flushes to total
combinations possible.
The total number of combinations that are flushes are the number of ways you can
draw 5 cards from the 13 cards of one suit without replacement, multiplied by 4.
13
This is 4*( ) or 5,148.
5
52
The total number of combinations possible is ( ) or 2,598,960.
5
The probability is 5,148/2,598,960 = 0.001981.
12
Revisiting the Hypergeometric Distribution
The total number of distinct ways that 2 black marbles can be chosen from 10 in the
10
population is (M choose s) or ( ).
2
The total number of ways that the remaining 18 available slots (18 because 20 were
drawn and 2 are occupied by black marbles) can be filled by the 90 white marbles in
90
the population is ((N-M) choose (n-s)) or ( ) .
18
The numerator is the total number of relevant events the ways that the 18 white
and 2 black can occur together: the product ( ) ( )
= 45* 3.78965*10^18 = 1.70534*10^20.
The denominator is the total number of events the ways 20 draws can be taken
100
from the population of 100 marbles without regard to color: ( ) = ( )=
20
5.35983*10^20.
https://www.coursera.org/learn/datasciencemathskills
13