You are on page 1of 13

Probability and Statistics

Nisheeth Bandaru

c 2014 Nisheeth Bandaru


Copyright
NISHEETH @ OUTLOOK . COM

This work is licensed under the Creative Commons Attribution 4.0 International License. To
view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Unless required by applicable law or agreed to in writing, software distributed under the License
is distributed on an AS IS BASIS , WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
either express or implied. See the License for the specific language governing permissions and
limitations under the License.
First printing, Oct 2014

Contents

Introduction to Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1

Introduction

1.2

Basic Probability Concepts

1.3

Permutations and Combinations

1.4

Axioms of Probability

1.4.1
1.4.2
1.4.3

Addition Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Independent Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Books

11

Articles

11

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Introduction
Basic Probability Concepts
Permutations and Combinations
Axioms of Probability
Addition Rule
Conditional Probability
Independent Events
Books
Articles

1. Introduction to Probability

1.1

Introduction
What is statistics and why should I learn it? In many fields of work, we are often tasked with the
challenge of obtaining, interpreting and presenting data. A product team at a software company
might want to know how which features of the software product are often used. Massive amount
of data collected from customers who agree to share their usage info anonymously needs to be
analysed to further improve the user interface.1 A scientist recording some observations in an
experiment would want to come up with a model that explains the data. A model is a theoretical
explanation of the phenomenon under study. This should be converted to a mathematical equation
for to test the validity of the model, a prediction must be made via the model and should be be
confirmed by experiment.
Statistics provides a methodological way of making decisions based on analysis of well-crafted
experiments. Since an experiment is seldom perfect, it also provides methods to assess the
degree of uncertainty in our findings. These techniques can be broadly classified into descriptive
statistics and inferential statistics. Descriptive statistics is all about techniques - analytical
and graphical - that paints a picture of the data. Inferential statistics draws conclusions on a
population based on observations of a representative sample of the population. This brings us to
the first definition:
Definition 1.1.1 Population and Sample. A group of entities or objects which we would

like to study is called population. A subset of the population, on which conclusions are
actually made, is called a sample.
Ideally, we would want to work with the entire population. Almost always, however, this
is impractical. Our hope is that, through proper use of statistics, the sample is a representative
description of the entire population.
The mathematics of statistics is built on probability theory. Therefore, this and the next few
chapters of this book will review the concepts of probability.
1 Data influencing the Ribbon interface in Microsoft Office 2007: http://blogs.msdn.com/b/jensenh/
archive/2006/04/05/568947.aspx

Chapter 1. Introduction to Probability

1.2

Basic Probability Concepts


In popular media, probability is usually indicated as a percentage of something occurring. For
example, a weather report might say, there is a 75% chance of precipitation today. However,
in formal probability theory, probability is specified as a number between 0 and 1, inclusive.
So after reading this book, we would be saying theres a 0.75 chance of rain. Quite intuitively,
numbers near 0 indicate a very small (but not never) chance of occurring. Similarly, numbers
near 1 indicate a high likelihood (but not always) of that event taking place. Probabilities near
0.5 are just as likely to occur as not.
Let us now see how to actually compute probabilities. This calls for the use of two terms,
sample space and events. An event is the outcome of a particular experiment. Sample space
is a collection (more technically, a set) of all possible outcomes such that each outcome corresponds to only one element in the set. Any particular outcome of the set is called a sample point.
Definition 1.2.1 Sample space and Events. Sample space is a set S with elements such

that each corresponds to a unique possible outcome of an experiment. An element of S is


called a sample point.
Any subset of S is called an event. This includes the empty set {} (impossible event) and S
(certain event).
It is easy to list a sample space when the number of possible outcomes is small. Imagine a
chess tournament, where three matches are played to decide the winner. For each player, with w
denoting a win and l denoting a loss, the sample space will look like:
S = {lll, llw, lwl, lww, wll, wlw, wwl, www}
Lets say we are interested in the event where a player wins atleast two matches (and thus the
tournament). This would be the subset,
E = {lww, wlw, wwl, www}
If we are to calculate the odds of a player winning the tournament, i.e. event E occurring, would
be the ratio of the number of possible ways E can occur to the total number of possibilities. That
is,
P(E) =

n(E)
number of ways E can occur
=
n(S)
number of ways the experiment may lead to

(1.1)

This is called the classical formula. For the above example, the probability of a player
winning the tournament using this formula would be, 48 = 0.5, as expected. Notice however,
this method may not always be appropriate. If I (a novice in Chess) were to play against a
professional chess player, surely, I am less likely to win the tournament than my seasoned
opponent! Use of the classical formula makes sense only when the outcomes of S are equally
likely. A better approach in cases where this is not the case would be to use a relative frequency
approach. The idea is to run an experiment repeatedly and count the number of times an event of
interest E has occurred in all those experiments:

P(E) =

f
number of times E has occured
=
n number of times the experiment was performed

(1.2)

The disadvantage with this method is the experiment must be repeatable. It might be tiring to
hold several chess matches to determine my winning odds.

1.3 Permutations and Combinations




Example 1.1 Let us take the same chess tournament example. Suppose I am interested in the

events
A: The first match is a win
B: The second match is a win
C: The third match is a win
Looking at S, and mathematically expressing the three events:
A = {wll, wlw, wwl, www}
B = {lwl, lww, wwl, www}
C = {llw, lww, wlw, www}
The event where either the first or the last match is a win, is expressed by the union operator:
A C = {wll, wlw, wwl, www, llw, lww}
The event where both the first and second matches are wins, is expressed by the intersection
operator:
A B = {wwl, www}
The set where the first two matches are wins, but the third is a loss is given by
(A B) C0 = {wwl}


One case of interest is when two or more events cannot occur at the same time; the occurrence
of one event precludes the other. Such events are called mutually exclusive. In the above example,
events where atleast two matches are won and atleast two are lost are clearly mutually exclusive.
Definition 1.2.2 Mutually exclusive events. Two or more events E1 , E2 , E3 , are said

to be mutually exclusive if and only if Ei E j = for any i 6= j.

1.3

Permutations and Combinations


As the sample space increases, it becomes tedious if all possibilities have to be enumerated
manually. We must find a mathematical way of counting quickly. The two counting problems
are:
Combinations: Selecting items from a collection without regard to the order of selection.
Permutations: Arranging items where the order of arrangement matters.
Furthermore, in each of the two cases, repetition of items may or may not be allowed. Which
counting technique applies is determined from the context of the problem. Ill start with examples
and then provide the generalized expression for each case. Let me begin with permutations when
repetition is allowed.
Example 1.2 Addresses in a computers memory are usually denoted by programmers in
hexadecimal (base 16) number system, that is, the set of 16 digits are (0 - 9, A - F). One example
address in a 16 bit wide register would be 0x73FF (The starting 0x denotes the number is in base
16).
How many addresses are possible if the address is four digits long in hexadecimal? We have 16


Chapter 1. Introduction to Probability

possibilities for the first digit, 16 for the second, 16 for the third and 16 for the fourth. Therefore,
the total number of addresses possible are:
16 16 16 16 = 65536


In general, if we have a collection of n items and need to pick k of these, we have, n options
for the first, n options for the second, n options for the third, and so on for all up to the k items.
That is, the total number of possible permutations are:
n n n = nk

Definition 1.3.1 Permutation with repetition. if k number of objects are to be picked

from a collection of n items, the number of permutations when repetition of objects is allowed
is nk .
What about the case when repetition is not allowed? In the general case, once we have picked
the first item among n options, we are left with (n 1) possibilities for the second. The third
item can be chosen in (n 2) ways, and so on:
n (n 1) (n 2) (n (k 1))
Multiplying and dividing by (n k) (n k 1) 3 2 1, and recognizing that r (r 1) (r
2) 3 2 1 is r!, we get
n (n 1) (n 2) (n k + 1)(n k) (n k 1) 3 2 1
n!
=
(n k) (n k 1) 3 2 1
(n k)!
This is expressed more simply as
n

Pk

 Example 1.3 Life is possible because of twenty different amino acids (called the essential
amino acids) which go on to compose all the proteins in our body. Each amino acid is encoded
by a word in the RNA made from four nucleotides abbreviated by the letters (A, U, G, C).
How many letters are required in each word to be able to represent all 20 possible amino
acids?
Lets start with 2 letter words. If we had two letter wide words, we have 4 options for
the first letter and 4 options for the second; totalling 16 possibilities. Clearly, 2 isnt
sufficient. What about 3 letter words? We have 4 4 4 = 64 possibilities; more than the
requirement of 20. And indeed, amino acids are encoded by 3 letter words called codons
(such as AUG, AAG, ).
How many of these 64 words have a repetition the nucleotides?
It is easier to count the case when there is no repetition. That would be 4 3 2 = 24.
Therefore, the number of codons where each nucleotide is used just once is 64 24 = 40.
As described earlier, each codon synthesizes a particular amino acid. Peptide bonds
fuse the amino acids to form the primary structure of a protein. The same set of amino
acids, when bonded in different order, forms different proteins. How many protein primary
structures are possible with 5 different amino acids?
5 4 3 2 1 = 5! = 120. In general, k unique items can be arranged in k! ways.
Primary structure isnt the endpoint of the protein structure. This sequence determines
whether they twist or remain straight forming the secondary structure. These twists and

1.4 Axioms of Probability

turns further interact with themselves or with that on other polypeptides to form a three
dimensional (tertiary) structure and quaternary structure. This mind-boggling number of
ways ultimately dictates the biology at the cellular level.2


Combinations

Permutation is a two step process - (i) selecting k objects from a collection of n items and (ii)
arranging them in k! possible ways. If only the first step of the process is of interest, that is a
combination problem. To reiterate, combinations is the problem of selecting items without any
regard to the order of selection. For instance, selecting 11 cricket players for a match among a
pool of several competent players is a combination problem; the number of ways of determining
the batting order of those 11 players is a permutation problem.
Mathematically, combination is represented as nCk . Expressing two step process of permutation:
n

Pk =n Ck k!

Plugging in the expression for n Pk ,


n

Ck =

n!
(n k)! k!

(1.3)

The above expression is for the case when we do not have repetition.

1.4

Axioms of Probability
We build the probability theory starting with three axioms; statements that are assumed to be
true by definition and require no proof.
1. If S denotes the sample space of an experiment, the probability P(S) = 1. That is to say, the
sample space is exhaustive and there is no outcome that can fall outside what is possible
per the sample space.
2. For any event A in the sample space, P(A) 0.
3. If A1 , A2 , A3 , are a countable collection of mutually exclusive events, then the probability of the probability of one of these events occurring is the sum of probabilities of each
event, P(A1 A2 A3 ) = P(A1 ) + P(A2 ) + P(A3 ) + .
What follows next, is a logical extension of ideas based on these fundamental axioms.
First, using the fact that S and (empty set) are mutually exclusive, and axiom 1, we have
P( ) = 0

(1.4)

Similarly, any event A occurring and not occurring simultaneously is mutually exclusive. So,
P(A0 ) = 1 P(A)
1.4.1

(1.5)

Addition Rule
Suppose we have two events A and B, which are not mutually exclusive, that is there is a chance
that both may occur [P(A B) > 0]. The addition rule says,
P(A B) = P(A) + P(B) P(A B)

(1.6)

Both P(A) and P(B) include the intersection P(A B); thus the subtraction on the right side to
account for the double counting.
2 To

learn more about the secret of life follow this link: http://www.700x.org/

Chapter 1. Introduction to Probability

10
1.4.2

Conditional Probability
If there are two events, A and B, where event B has some non-zero chance of occurring, the
probability of A happening, given that B has already occurred is given by the conditional
probability
P(A|B) =

1.4.3

P(A B)
P(B)

(1.7)

Independent Events
Two events are said to mutually independent if the occurrence of one event has no impact
whatsoever on the happening of the other event. That is,
P(A|B) = P(A); P(B|A) = P(B)
From equation 1.7, it follows,
P(A B) = P(A)P(B)

(1.8)

Beginners often confuse independence with mutual exclusiveness. Note that if two events are
mutually exclusive, the happening of one event precludes the other event from happening; on
the other hand, if they were mutually exclusive, both events can occur together, only they dont
influence each other. As an example, consider the tossing of a coin. The event of seeing a head
prevents the observation of tails; these two are mutually exclusive. However, two tosses of the
coin are independent; what I observe on the first turn doesnt tell anything about whats going to
happen on the second toss.

Bibliography

Books
[Smi12] John Smith. Book title. 1st edition. Volume 3. 2. City: Publisher, Jan. 2012, pages 123
200.

Articles
[Smi13]

James Smith. Article title. In: 14.6 (Mar. 2013), pages 18.

Index

Citation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Corollaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

P
D
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

E
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Equation and Text. . . . . . . . . . . . . . . . . . .8
Paragraph of Text . . . . . . . . . . . . . . . . . . . 9
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Paragraphs of Text . . . . . . . . . . . . . . . . . . . . . . 5
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Several Equations . . . . . . . . . . . . . . . . . . . 8
Single Line . . . . . . . . . . . . . . . . . . . . . . . . . 8

R
Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

T
F
Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Several Equations . . . . . . . . . . . . . . . . . . . 7
Single Line . . . . . . . . . . . . . . . . . . . . . . . . . 7

L
Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Bullet Points . . . . . . . . . . . . . . . . . . . . . . . 6
Descriptions and Definitions . . . . . . . . . 6
Numbered List . . . . . . . . . . . . . . . . . . . . . 6

V
Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9